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Abstract: 

Graphical models are popular statistical tools which are used to represent dependent 
or causal complex systems. Statistically equivalent causal or directed graphical models 
are said to belong to a Markov equivalent class. It is of great interest to describe 
and understand the space of such classes. However, with currently known algorithms, 
sampling over such classes is only feasible for graphs with fewer than approximately 
20 vertices. 

In this paper, we design reversible irreducible Markov chains on the space of Markov 
equivalent classes by proposing a perfect set of operators that determine the transitions 
of the Markov chain. The stationary distribution of a proposed Markov chain has a 
closed form and can be computed easily. Especially, we construct a concrete perfect set 
of operators on sparse Markov equivalence classes by introducing appropriate condi- 
tions on each possible operator. Algorithms and their accelerated version are provided 
to efficiently generate Markov chains and to explore properties of Markov equivalence 
classes of sparse directed acyclic graphs (DAGs) with thousands of vertices. 

We find experimentally that in most Markov equivalence classes of sparse DAGs, 
(1) most edges are directed, (2) most undirected subgraphs are small, and (3) the 
number of these undirected subgraphs grows approximately linearly with the number 
of vertices. 
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1. Introduction 

Graphical models based on directed acyclic graphs ( DAGs, denoted as V) are widely used 
to represent causal or dependent relationships in various scientific investigations, such as 
bioinformatics, epidemiology, sociology and business [11, 12, 17, 18, 22, 27, 31]. A DAG 
encodes the independence and conditional independence restrictions (Markov properties) of 
variables. However, because different DAGs can encode the same set of independencies or 
conditional independencies, most of the time we cannot distinguish DAGs via observational 
data [28]. A Markov equivalence class is used to represent all DAGs that encode the same 
dependencies and independencies [2, 6, 29]. A Markov equivalence class can be visualized (or 
modeled) and uniquely represented by a completed partial directed acyclic graph (completed 
PDAG for short) [6] which possibly contains both directed edges and undirected edges [20]. 
There exists a one-to-one correspondence between completed PDAGs and Markov equiva- 
lence classes [2]. The completed PDAGs are also called essential graphs by Andersson et al. 
[2] and maximally oriented graphs by Meek [24]. 

A set of completed PDAGs can be used as a model space. The modeling task is to discover 
a proper Markov equivalence class in the model space [3, 4, 7, 8, 16, 23]. Understanding the 
set of Markov equivalence classes is important and useful for statistical causal modeling 
[13, 14, 19]. For example, if the number of DAGs is large for Markov equivalence classes in 
the model space, searching based on unique completed PDAGs could be substantially more 
efficient than searching based on DAGs [6, 23, 25]. Moreover, if most completed PDAGs 
in the model space have many undirected edges (with non- identifiable directions), many 
interventions might be needed to identify the causal directions [10, 15]. 

Because the number of Markov equivalence classes increases super-exponentially with the 
number of vertices (eg. more than 10^* classes with 10 vertices) [14] , it's hard to study sets 
of Markov equivalence classes. To our knowledge, only whole set that contains all completed 
PDAGs with a small given number of vertices (< 10) have been studied thoroughly in 
the literature [13, 14, 30]. Moreover, these studies focus on the size of Markov equivalence 
classes, which is defined as the number of DAGs in a Markov equivalence class. Gillispie and 
Perlman [14] obtain the true size distribution of all Markov equivalence classes with a given 
number (10 or fewer) of vertices by listing all classes. Pena [30] designs a Markov chain to 
estimate the proportion of the equivalence classes containing only one DAG for graphs with 
20 or fewer vertices. 

In recent years, sparse graphical models have become popular tools for fitting high- 
dimensional multivariate data. The sparsity assumption introduces restrictions on the model 
space; a standard restriction is that the number of edges in the graph be less than a small 
multiple of the number of vertices. It is thus both interesting and important to be able to 
explore the properties of subsets of sparse graphical models. In this paper, we introduce 
a reversible irreducible Markov chain on general Markov equivalence classes. This Markov 
chain allows one to study properties of the sets that contain sparse Markov equivalence 
classes in a computationally efficient manner, for sparse graphs with thousands of vertices. 

1.1. A Markov equivalence class and its representation 

A graph Q is defined as a pair (V, E), where V — {xi, • • • , Xp} denotes the vertex set with p 
variables and E denotes the edge set. Let ng — \E\ be the number of edges iuQ. A directed 
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(undirected) edge is denoted as —> or -;—(—). A graph is directed (undirected) if all of its 
edges are directed (undirected). A sequence {xi,X2, ■ ■ ■ ,Xk) of distinct vertices is called a 
path from xi to Xk if either Xi — >■ a^i+i or Xi — Xi+i is in E for alH = 1, . . . , fc — 1. A path is 
partially directed if at least one edge in it is directed. A path is directed (undirected) if all 
edges are directed (undirected). A cycle is a path from a vertex to itself. 

A directed acyclic graph (DAG), denoted by V, is a directed graph which does not contain 
any directed cycle. Let r be a subset of V. The subgraph Vr — (r, Et) induced by the subset 
T has vertex set r and edge set Et, the subset of E which contains the edges with both 
vertices in r. A subgraph x — )■ z ?/ is called a v-structure if there is no edge between x and 
y. An acyclic partially directed graph (PDAG), denoted by is a graph with no directed 
cycle. 

A graphical model consists of a DAG and a joint probability distribution. With the 
graphical model, all conditional independencies implied by the joint probability distribution 
can be read from the DAG. These conditional independencies are called Markov properties 
of the DAG [28]. A Markov equivalence class (MEG) is a set of DAGs that have the same 
Markov properties. Let the skeleton of an arbitrary graph Q be the undirected graph with 
the same vertices and edges as G, regardless of their directions. Verma and Pearl [32] proved 
the following characterization of Markov equivalence classes: 

Lemma 1 (Verma and Pearl [32]). Two DAGs are Markov equivalent if and only if they 
have the same skeleton and the same v-structures. 

This lemma implies that, among DAGs in an equivalence class, some edge orientations 
may vary, while others will be preserved (for example, those involved in a V-structure). 
Consequently, a Markov equivalence class can be represented uniquely by a completed PDAG, 
defined as follows: 

Definition 1 (Completed PDAG [6]). The completed PDAG of a DAG V, denoted as C, 
is a PDAG that has the same skeleton as T>, and an edge is directed in C if and only if it 
has the same orientation in every equivalent DAG ofT>. 

In other words, a completed PDAG of a DAG T> has the same skeleton as T> and keeps 
the v-structures of V. Another popular name of a completed PDAG is "essential graph" 
introduced by Andersson et al. [2], who show that all directed edges in a completed PDAG 
must be "strongly protected" , defined as follows: 

Definition 2. Let Q — {V,E) he a graph. A directed edge v ^ u € E is strongly protected 
in G if v ^ u Cz E occurs in at least one of the four induced subgraphs of Q 4^ Figure 1: 



(a) : V 





w 



w 



w 



Fig 1. Four configurations where v —> u is strongly protected in Q. 



Andersson et al. [2] also introduce necessary and sufficient conditions for a graph to be 
an essential graph (equivalently, a completed PDAG), which we include in Lemma 2 in 



Y.B. He et al. /Reversible MCMC on markov equivalence classes of sparse DAGs 



4 



Appendix A. 

If we delete all directed edges from a completed PDAG, we are left with several isolated 
undirected subgraphs. Each isolated undirected subgraph is a chain component of the com- 
pleted PDAG. Observational data is not sufficient to learn the directions of undirected edges 
of a completed PDAG; one must perform additional intervention experiments. In general, 
the size of a chain component is a measure of "complexity" of causal learning; the larger 
chain components the more interventions necessary to learn the underlying causal graph 
[15]. 

In learning graphical models [6] or studying Markov equivalence classes [30], Markov 
chains on completed PDAGs play an important role. We briefly introduce the existing meth- 
ods to construct Markov chains on completed PDAGs in the next subsection. 

1.2. Markov chains on completed PDAGs 

To construct a Markov chain on completed PDAGs, we need to generate the transitions 
among them. In general, an operator that can modify the initial completed PDAG locally 
can be used to carry out a transition [6, 25, 30]. Let C be a completed PDAG. We consider 
six types of operators on C: inserting an undirected edge (denoted by InsertU ), deleting 
an undirected edge (DeleteU), inserting a directed edge (InsertD), deleting a directed edge 
(DeleteD), making a v-structure (MakeV) and removing a v-structure (RemoveV). We call 
InsertU, DeleteU, InsertD, DeleteD, MakeV, and RemoveV the types of operators. An op- 
erator on a given completed PDAG is determined by two parts: its type and the modified 
edges. For example, the operator "InsertU a;— y" on C represents inserting an undirected edge 
X — y to C, and x — y is the modified edge of the operator. A modified graph of an operator 
is the same as the initial completed PDAG, except for the modified edges of the operator. 
A modified graph might (not) be a PDAG or completed PDAG. Example 1 illustrates six 
operators on a completed PDAG C and their corresponding modified graphs. 

Example 1. Figure 2 displays six operators: InsertU x — z, DeleteU y — z, InsertD 
X — > f , DeleteD z — w, MakeV z ^ y u, and RemoveV z v u. After inserting an 
undirected edge x — z into the initial graph C, we get a modified graph denoted as Vi in 
Figure 2. By applying the other five operators to C in Figure 2 respectively, we can obtain 
other five corresponding modified graphs 7^2 , T'a , "^4 , 'Ps , and Vq. Here the operator "MakeV 
z ^ y <— modifies z — y — utoz— >y-s— u and the operator "Remove z — >■ w u" 
modifies z^v^utoz — v — u. Notice that a modified graph might not be a PDAG though 
all modified graphs in this example are PDAGs. 

In the above example, we see that the modified graph of an operator, denoted by V, 
might be a PDAG, but might not be a completed PDAG. For example, the modified graphs 
7^4, and in Figure 2 are not completed PDAGs because the directed edge y — )■ w is not 
strongly protected; and V5 is not completed PDAG because z ^ y — x occurs, which makes 
condition (iii) in Lemma 2 not hold. A consistent extension of a PDAG is a directed 
acyclic graph (DAG) on the same underlying set of edges, with the same orientations on 
the directed edges of V and the same set of v-structures [9, 33]. According to Lemma 1, all 
consistent extensions of a PDAG V, if they exist, belong to a unique Markov equivalence 
class. Hence if the modified graph of an operator is a PDAG and has a consistent extension, 
it can result in a completed PDAG that corresponds to a unique Markov equivalence class. 
We call it the resulting completed PDAG of the operator. 
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InsertU x — z InsertD x ^ v MakeV 2; — > 3/ « 

X XX 




Fig 2. Examples of six operators of P DAG C. Vi to "Pg are the modified graphs of six operators. 



To obtain a valid transition from one completed PDAG C to another, Chickering [6] 
introduces the concept of validity for an operator on C. A valid operator is defined as below. 

Definition 3 (Valid operator). An operator on C is valid if (1) the modified graph of the 
operator is a PDAG and has a consistent extension, and (2) all modified edges in the modified 
graph occur in the resulting completed PDAG of the operator. 

The first condition in Definition 3 guarantees that a valid operator results in a completed 
PDAG. The second condition guarantees that the valid operator is "effective"; that is, the 
change brought about by the operator occurs in the resulting completed PDAG. The second 
condition is not trivial. Consider a three-vertex completed PDAG {x — y z) that has only 
one undirected edge x — y; after applying the operator "InsertD y ^ we get a modified 
graph X — y -> z. Clearly, x — y — > z is a PDAG and has a consistent extension (say 
X y z). However, the corresponding resulting completed PDAG is x — y — z and 
the operator "InsertD y — ?> z" is not "effective". Here we notice that the second condition 
is implied by the context in Chickering [6]. Below we briefly introduce how to obtain the 
resulting completed PDAG of a valid operator from the modified graph. 

Verma and Pearl [33] and Meek [24] introduce an algorithm for finding the completed 
PDAG from a "pattern" (given skeleton and v-structures). This method can be used to 
create the completed PDAG from a DAG or a PDAG. They first undirect every edge, except 
for those edges that participate in a v-structure. Then they choose one of the undirected 
edges and direct it if the corresponding directed edge is strongly protected, as shown in 
Figure 1 (a), (c) or (d). The algorithm terminates when there is no undirected edge that 
can be directed. 

Chickering [6] propose an alternative approach to obtain the completed PDAG of a valid 
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operator from its modified grapli. The method includes two steps. The first step generates 
a consistent extension (a DAG) of the modified graph (a PDAG) using algorithm described 
in Dor and Tarsi [9]. The second step creates a completed PDAG corresponding to the 
consistent extension. We recall Dor and Tarsi'a algorithm in Algorithm 3 in Appendix A. 
Chickering's algorithms [5] for the second step are also summarized in Algorithm 4 in Ap- 
pendix A for completeness of the paper. 

The approach proposed by Chickering [5] is "more complicated but more efficient" [24] 
than Meek's method described above. Hence in this paper, we use Chickering's approach to 
obtain the resulting completed PDAG of a valid operator from its modified graph. Below, 
we give an example to illustrate this approach. 



Initial modified consistent the resulting 

completed PDAG graph extension completed PDAG 



(apply 

operator) / /V step 1 /rf'V. step 2 







(n) 



Fig 3. Example for constructing the unique resulting completed PDAG of a valid operator. An operator 
"Remove z v <— u" in Figure 2 is applied to the initial completed PDAG C and finally results in the 
resulting completed PDAG Ci. 

Example 2. Consider the initial completed PDAG C and the operator "Remove z ^ v <— 
u" in Figure 2. We illustrate in Figure 3 the steps of Chickering's approach that generates the 
resulting completed PDAG Ci by applying "Remove z v ^ u" to C. The first step (step 
1) extends the modified graph (a PDAG Ve) to a consistent extension {T>q) via Algorithm 
3 in Appendix A. The second step (step 2) constructs the resulting completed PDAG Ci of 
the operator "Remove z — ^ w -s— u" from the DAG Vq via Algorithm 4 in Appendix A. 

With a set of valid operators, a Markov chain on completed PDAGs can be constructed. 
Let Sp be the whole set of completed PDAGs with p vertices, 5 be a given subset of Sp. 
For any completed PDAG C G S, let Oc be a set of valid operators of interest to be defined 
later on C. A set of valid operators on S is defined as 



O^lJOc. (1.1) 



ces 

Here we notice that each operator in O is specific to the completed PDAG that the operator 
applies to. A Markov chain {et} on S based on the set O can be defined as follows. 

Definition 4 (A Markov Chain {et} on S). The Markov Chain {et} determined by a set 
of valid operators O is generated as follows: start at an arbitrary completed PDAG, denoted 
as eo = Co G S, and repeat the following steps for t — 0,1, ■ ■ ■ , 
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1. At the t-th step we are at a completed PDAG et- 

2. We choose an operator uniformly from Oe, ; if the resulting completed PDAG Ct+i 
of is in S , move to Ct+i and set et+i — Ct+i, otherwise we stay at et and set 
et+i = et- 

There are other transitions available in the literature. For example, the transitions pro- 
posed by Pena [30] move from completed PDAGs to their modified graphs only if the mod- 
ified graph happens to be its resulting completed PDAG. Obviously, the transitions used 
in Definition 4 are superior to Pena's because they provides more transition states for any 
completed PDAG. 

The set S is the finite state space of chain {et}. Clearly, the sequence of completed PDAGs 
{et : t = 0, 1, • • • } in Definition 4 is a discrete-time Markov chain [21, 26]. Let p^^, be the 
one-step transition probability of {et} from C to C for any two completed PDAGs C and C 
in S. A Markov chain {et) is irreducible if it can reach any completed PDAG starting at any 
state in S. If {et} is irreducible, there exists a unique stationary distribution tt — (tt^,C G 5) 
satisfying balance equations [26] 

TT, = ^ T^c'Pc'c for aU CeS. (1.2) 
c'es 

An irreducible chain et is reversible if there exists a probability distribution tt such that 

Standard results imply that tt is the unique stationary distribution of the discrete-time 
Markov chain {et} if it is finite, reversible, and irreducible. Moreover, the stationary proba- 
bilities TTj, can be calculated efficiently if the Markov chain satisfies Equation (1.3). 

The properties of the Markov chain {et} given in Definition 4 depend on the operator 
set O. To implement score-based searching in the whole set of Markov equivalence classes, 
Chickering [6] introduces a set of operators with types of InsertU, DeleteU, InsertD, DeleteD, 
MakeV, or ReverseD (reversing the direction of a directed edge), subject to some validity 
conditions. Unfortunately, the Markov chain in Definition 4 is not reversible if the set of 
Chickering's operators is used. Our goal is to design a reversible Markov chain, as it makes 
it easier to compute the stationary distribution, and thereby to study the properties of a 
subset of Markov equivalence classes. 

In section 2, we first discuss the properties of an operator set O needed to guarantee 
that the Markov chain is reversible. Section 2 also explains how to use the samples from 
the Markov chain to study properties of any given subset of Markov equivalence classes. 
In Section 3 we focus on studying sets of sparse Markov equivalence classes. Finally, in 
Section 4, we report the properties of directed edges and chain components in sparse Markov 
equivalence classes with up to one thousand of vertices. 

2. Reversible Markov chains on Markov equivalence classes 

Let S be any subset of the whole set Sp, and O be a set of operators on S defined in 
Equation (1.1). As in Definition 4, we can obtain a Markov chain denoted by {et}. We first 
discuss four properties of O that guarantee that {ct} is reversible and irreducible. They are 
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validity, distinguishability, irreducibility and reversibility. We call a set of operators perfect 
if it satisfies these four properties. Then we give the stationary distribution of {et} when O 
is perfect and show how to use {et} to study properties of S. 

2.1. A reversible Markov chain based on a perfect set of operators 

Let p^^, be a one-step transition probability of {et} from C to C for any two completed 
PDAGs C and C in S. In order to formulate p^^, clearly, we introduce two properties of O: 
Validity and Distinguishability. 

Definition 5 (Validity). Given S and any completed PDAG C in S , a set of operators O 
on S is valid if for any operator (o without confusion below) in Oc, o is valid according 
to Definition 3 and the resulting completed PDAG obtained by applying o to C is also in S. 

According to Definition 5, if a set of operators O on 5 is valid, we can move to a new com- 
pleted PDAG in each step of {e^} and the one-step transition probability of any completed 
PDAG to itself is zero: 



For a set of valid operators O and any completed PDAG C in S, we define the resulting 
completed PDAGs of the operators in Oc as the direct successors of C. For any direct 
successor of C, denoted by C, we obtain p^^, clearly as in Equation (2.2) if O has the 
following property. 

Definition 6 (Distinguishability). A set of valid operators O on S is distinguishable if for 
any completed PDAG C in S, different operators in Oc will result in different completed 



If O is distinguishable, for any direct successor of C, denoted by C, there is a unique 
operator in Oc that can transform C to C Thus, the number of operators in Oc is the same 
as the number of direct successors of C. Sampling operators from Oc uniformly generates 
a uniformly random transition from C to its direct successors. By denoting M(Oc) as the 
number of operators in Oc , we have 



We introduce this property because it makes computation of efficient: if O is distin- 
guishable, we know p^^, right away from M{Oc)- 

In order to make sure the Markov chain {et} is irreducible and reversible, we introduce 
two more properties of O: irreducibility and reversibility. 

Definition 7 (Irreducibility). A set of operators O on S is irreducible if for any two com- 
pleted PDAGs C,C' G iS, there exists a sequence of operators in O such that we can obtain 
C' from C by applying these operators sequentially. 

If O is irreducible, starting at any completed PDAG in S, we have positive probability 
to reach any other completed PDAG via a sequence of operators in O. Thus, the Markov 
chain {et} is irreducible. 



Pec — 0, for any completed PDAG C E S. 



(2.1) 



PDAGs. 




l/M{Oc), C is a direct successor of C € 5; 
0, otherwise. 



(2.2) 
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Definition 8 (Reversibility). A set of operators O on S is reversible if for any completed 
PDAG C G 5 and any operator a G Oc with C being the resulting completed PDAG of a, 
there is an operator o' G Oc such that C is the resulting completed PDAG of d . 

If tlie set of operators O on 5 is valid, distinguishable and reversible, for any pair of 
completed PDAGs C,C' G 5, C is also a direct successor of C if C is a direct successor of C. 
For any C G 5 and any of its direct successors C, we have 



Clearly, Equation (1.3) holds for tt^ in Equation (2.4) if O is valid, distinguishable and 
reversible, tt^, is the unique stationary distribution of {ct} if it is also irreducible [1, 21, 26]. 

In the following proposition, we summarize our results about the Markov Chain {et} on 
iS, and give its stationary distribution. 

Proposition 1 (Stationary distribution of {et}). Let S be any given set of completed 
PDAGs. The set of operators is defined as O ^ Uce5 where Oq is a set of operators on 
C for any C in S. Let M{Oc) be the number of operators in Oc - For the Markov chain {et} 
on S generated according to Definition 4, if O is perfect, that is, the properties - validity, 
distinguishability , reversibility and irreducibility hold for O, then 

L the Markov chain {cf} is irreducible and reversible, 

2. the distribution tt^ in Equation (2.4) is the unique stationary distribution o/{et} and 



The challenge is to construct a concrete perfect set of operators. In Section 3, we carry 
out such a construction for a set of Markov equivalence classes with sparsity constraints 
and provide algorithms to obtain a reversible Markov chain. We now show that a reversible 
Markov chain can be used to compute interesting properties of a completed PDAG set S. 

2.2. Estimating the properties of S by a perfect Markov chain 

For any C G 5, let /(C) be a real function describing any property of interest of C, and 
the random variable u be uniformly distributed on 5. In order to understand the property 
of interest, we compute the distribution of f{u). For example, if we are interested in the 
proportion of completed PDAGs with at most 10 undirected edges in S, we can define f{u) as 
the number of undirected edges in u and then compute the probability of {/(it) < 10}. The 
proportion of Markov equivalence classes with size of one (equivalently, completed PDAGs 
that are directed) in Sp is studied in the literature [13, 14, 30]. For this purpose, we can define 
f{u) as the size of Markov equivalence classes represented by u and obtain the proportion 
by computing the probability of {/(m) = 1}. 

Let A be any subset of M, the probability of {/(it) G A] is 



p,^, = l/M{Oc) and p^,^ = llM{Oa). 
Let T = Tlic&s -^^(^c): ci'iid define a probability distribution as 

TT, =M(Oc)/r. 



(2.3) 



(2.4) 



cxM(Oc). 



V[f{u)eA) 



|{C:/(C)gA,Cg5}| ^ T.ceshf{c)<,A} 

\s\ \s\ 



(2.5) 
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where 151 is the number of elements in the set S and / is an indicator function defined as 



1, if /(C) e A; 
{f(c)eA} S Q otherwise. 



Let {et\t=i,- - ,N be a reahzation of Markov chain {et\ on S based on a perfect operator 
set O according to Definition 4 and Mt = M{Oet)- Let 7r(et) be the stationary probability 
of Markov chain {cf }. From Proposition 1, we have 7r(et) oc Mj for t — 1, • • • ,N. We can 
use {et, Mt}t=i,--- ,n to estimate the probability of {/(u) G A} by 

(/(u) e ^) = SkW^ (2.6) 

From the ergodic theory of Markov chains [1, 21, 26], we can get Proposition 2 directly. 

Proposition 2. Let S he a given set of completed PDAGs, and assume the set of operators O 
on S is perfect. The Markov chain {et}t=i,--- ,n is obtained according to Definition 4- Then, 
the estimator P^r i{f{u) G A}) in Equation (2.6) converges to P({/(u) £ A}) in Equation 

(2.5) with probability one, that is, 

P (Pn ifiu) eA)^V {f{u) e A) as iV ^ oo) = 1. (2.7) 

Proposition 2 shows that the estimator defined in Equation (2.6) is a consistent estimator 
of P (/(u) e A). We can study any given subset of Markov equivalence classes via Equation 

(2.6) if we can obtain {et}t=i, -- .n and {Mt}t=i ... ^n. We now turn to construct a concrete 
perfect set of operators for a set of completed PDAGs with sparsity constraints and then 
introduce algorithms to run a reversible Markov chain. 



3. A Reversible Markov chain on completed PDAGs with sparsity constraints 

We define a set of Markov equivalence classes Sp with p vertices and at most n edges as 
follows: 

Sp = {C : C is a completed PDAG with p vertices and Uc < n}, (3-1) 

where is the number of edges in C. Recall that Sp denotes the whole set of completed 
PDAGs with p vertices. Clearly, S^ = Sp when n > p{p — l)/2. 

We now construct a perfect set of operators on Sp . Notice that our constructions can be 
extended to adapt to some other sets of completed PDAGs, say, a set of completed PDAGS 
with a given maximum degree. In Section 3.1, we construct the perfect set of operators for 
any completed PDAG in Sp. In Section 3.2, we propose algorithms and their accelerated 
version for efficiently obtaining a Markov chain based on the perfect set of operators. We 
finally provide the proofs of all theorems in this section in Appendix B. 
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3.1. Construction of a perfect set of operators on 

In order to construct a perfect set of operators, we need to define the set of operators on 
each completed PDAG in Sp. Let C be a completed PDAG in Sp. We consider six types 
of operators on C that were introduced in Section 1.2: InsertU, DeleteU, InsertD, DeleteD, 
MakeV, and RemoveV. The operators on C with the same type but different modified edges 
constitute a set of operators. We introduce six sets of operators on C denoted by InsertUc, 
DeleteUc, InsertDc, DeleteDc, MakeVc and RemoveVc in Definition 9. 

First we explain some notation used in Definition 9. Let x and y be any two distinct 
vertices in C. The neighbor set of x denoted by consists of every vertex y with x — y in 
C. The common neighbor set of x and y is defined as N^y — D Ny. a; is a parent of y and 
y is a child of a; if a; — >■ y occurs in C . A vertex u is a common child of x and y if u is a child 
of both X and y. Hx represents the set of all parents of x. 

Definition 9 (Six sets of operators on C). Let C be a completed PDAG in Sp and nc be 

the number of edges in C. We introduce six sets of operators on C: InsertUc DeleteUc, 
InsertDc, DeleteDc, MakeVc and RemoveVc as follows. 

(a) For any two vertices x, y that are not adjacent in C, the operator "InsertU x ~ y" on C 

is in InsertUc if and only if (iui) nc < n; (iu2) "InsertU x ~ y" is valid; (iu^) for 
any u that is a common child of x, y in C, both x u and y ^ u occur in the resulting 
completed PDAG of "InsertU x - y". 

(b) For any undirected edge x — y in C, the operator "DeleteU x — y" on C is in DeleteUc 

if and only if (dui ) "Delete U x — y" is valid. 

(c) For any two vertices x,y that are not adjacent in C, the operator "InsertD x ^ y" on 

C is in InsertDc if and only if (idi) nc < n; (id2) "InsertD x y" is valid; (ids) 
for any u that is a common child of x,y inC, y u occurs in the resulting completed 
PDAG of "InsertD x-^y". 

(d) For any directed edge x ^ y in C, operator "DeleteD x y" on C is in DeleteDc if 

and only if (ddi) "DeleteD x ^ y" is valid; (dd2) for any v that is a parent of y but 
not a parent of x, directed edge v y in C occurs in the resulting completed PDAG of 
"DeleteD x~>y". 

(e) For any subgraph x — z — y in C, the operator "MakeV x z ^ y" on C is in MakeVc 

if and only if (mvi) "MakeV x z ^ y" is valid. 

(f) For any v-structure x z ^ y of C, the operator "RemoveV x ^ z ^ y" on C is in 

RemoveVc if and only if (rvi) n^; = Uy; (rv2) XI^; U N^y = n^\{a;, y}; (rva) every 
undirected path between x and y contains a vertex in N^y . 

Munteanu and Bendou [25] discuss the constraints for the first five types of operators such 
that each one can transform one completed PDAG to another. Chickering [6] introduces the 
necessary and sufficient conditions such that these five types of operators are valid. The 
conditions proposed by Munteanu and Bendou [25] and Chickering [6] guarantee that the 
conditions iu2, dui, id2, ddi and mvi in Definition 9 hold. We employ the conditions 
introduced by Chickering [6] and list them in Lemma 3 in Appendix A. However, given 
only Chickering's conditions, we do not necessarily have a perfect set of operators. We 
introduce several additional conditions in Definition 9, including iui, iu^, idi, id^ and dd2 
for operators in InsertUc, InsertDc and DeleteDc, and rvi,rv2 and rva for operators in 
RemoveVc ■ 
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The set of operators on C denoted by Oc is defined as follows. 

Oc = InsertUc U DeleteUc U InsertDc U DeleteDc U MakeVc U RemoveVc- (3.2) 
Taking the union over all completed PDAGs in Sp, we define the set of operators on Sp 

as 

0= \J Oc, (3.3) 

where Oc is the set of operators in Equation (3.2). In main result of this paper, we show 
that O in Equation (3.3) is a perfect set of operators on Sp. 

Theorem 1 (A perfect set of operators on Sp). O defined in Equation (3.3) is a perfect set 
of operators on Sp . 

The proof of Theorem 1 shows that iu3, ids and dd2 are key conditions in Definition 9 to 
guarantee that O is reversible. We also use Example 3 to show heuristically that conditions 
iu3 and dd2 are necessary for O to be a perfect set of operators. 

Example 3. This example illustrates that O in Equation (3.3) will not be reversible if 
condition iu3 or dd2 is not contained in Definition 9. Consider operator set O defined in 
Equation (3.3) for S5 and the completed PDAG C ^ S5 in Figure 4. We have that operator 
InsertU z — u and DeleteD z ^ v are valid according to Definition 5. As shown in Figure 4, 
InsertU z — u transfers C to the completed PDAG Ci and DeleteD z — >■ u transfers C to the 
completed PDAG €2- However, deleting z — u from Ci will result in an undirected PDAG 
distinct from C and InsertD z —> v is not valid for C2 according to Lemma 3 in Appendix A. 
As a consequence, if O contains InsertU z — u and DeleteD z — >■ w, it will be not reversible 
according to Definition 8. According to Definition 9, these two operators do not appear in 
Oc because they do not satisfy the conditions iua and dd2 respectively. 




« (Ci) ^' (C) (C2) 



Fig 4. Example: Two valid operators bring about irreversibility. It shows valid conditions are not sufficient 
for perfect operator set. 

Now we give a toy example to show how to construct a concrete perfect set of operators 
following Definition 9. 

Example 4. Consider the completed PDAG C in Example 3. Here we introduce the 
procedure to determine InsertUc ■ All possible operators of inserting an undirected edge to 
C include: "InsertU x—z" , "InsertU x—m" , "InsertU x—w" and "InsertU z — m" . The operator 
"InsertU x — v" is not valid according to Lemma 3 in Appendix A since n(a;) 7^ H(w). The 
operator "InsertU z — u" is valid; however, condition iua does not hold according to Example 
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3. According to Definition 9, we have that only "InscrtU x — z" and "InsertU x — u" are in 
InsertUc- Thus InsertUc = {x ~ z,x — u}, where "x — z" denotes "InsertU x — z" in the 
set. Table 1 lists the six sets of operators on C. 




Table 1 

The six sets of operators of C. These operators are perfect. 



InsertUc = {x — z^x — u} 


DeleteUc = {x — y,y — z,y — u} 


InsertDc = {x u} 


DeleteDc = {v ^ v} 


MakeVc = {x — y — z, x ~ y ~ u, 

z — y — u} 


RemoveVc = {« — > n <— 2} 



The preceding section showed how to construct a perfect set of operators. Based on the 
perfect set of operators we can obtain a finite irreducible reversible discrete-time chain. In 
the next subsection, we provide detailed algorithms for obtaining a Markov chain on Sp and 
their accelerated version. 



3.2. Algorithms 

In this subsection, we provide the algorithms in detail to generate a Markov chain on Sp, 
defined in Definition 4 based on the perfect set of operators defined in Equation (3.3). A 
sketch of Algorithm 1 is shown below; some steps of this algorithm are further explained in 
the subsequent algorithms. 



Algorithm 1: Road map to construct a Markov chain on Sp 

Input; 

p, the number of vertices; n, the maximum number of edges; A'^, the length of Markov chain. 
Output: 

{et, Mt}t=i.--- ,jv> where {et} is Markov chain and Mt is the number of operators in Oe^ . 

1 Initialize eo as any completed PDAG in Sp; 

2 for t to AT do 

Step A Construct the set of operators Oe^ in Equation (3.2) via Algorithm 1.1; 

Step B Let Mt be the number of operators in Oe^ ; 

Step C Randomly choose an operator o uniformly from ; 

Step D Apply operator o to et- Set et+i as the resulting completed PDAG of o. 

3 return {et, A/t}t=i,... _jv. 



Step A of Algorithm 1 constructs the sets of operators on completed PDAGs in the chain 
{et}. It is the most difficult step and dominates the time complexity of Algorithm 1. Step 
B and Step C can be implemented easily after Oet is obtained. Step D can be implemented 
via Chickering's method [6] that was mentioned in Section 1.2. We will show that the time 
complexity of obtaining a Markov chain on Sp with length N {{et}t=i, -- ,n) is approximate 
0{Np^) if n is the same order of p. For large p, we also provide an accelerated version that, 
in some cases, can run hundreds of times faster. 
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The rest of this section is arranged as follows. In Section 3.2.1, we first introduce the 
algorithms to implement Step A. In Section 3.2.2 we discuss the time complexity of our 
algorithm, and provide an acceleration method to speed up Algorithm 1. 

3.2.1. Implementation of Step A in Algorithm 1 

A detailed implementation of Step A (to construct OeJ is described in Algorithm 1.1. To 
construct Oet in Algorithm 1.1, we go through all possible operators on et and choose those 
satisfying the corresponding conditions in Definition 9. 

The conditions in Algorithm 1.1 include: iui, ius, ids, dd2, rmi, rvi, rv2 iu2.i, iu2.2, 
dui 1, idi, id2.i, id2.2, id2.3, ddi i and mvi i. The conditions iui, iu3, idi, Ida, dd2, rmi, 
rvi, rv2 can be found in Definition 9. The other conditions (iu2.i, iu2.2), dui.i, (id2.i, id2.2, 
id2.3), ddi.i and mvi.i are equivalent, respectively, to the validity conditions iu2, dui, id2, 
ddi and mvi in Definition 9 and can be found in Lemma 3 in Appendix A. For each possible 
operator, we check the corresponding conditions shown in Algorithm 1.1 one-by-one until 
one of them fails. Below, we introduce how to check these conditions. 



Algorithm 1.1: Construct the operator set Oet for a completed PDAG et. 

Input; A completed PDAG -et with p vertices. 
Output: operator set . 

// Directed-edges et J Undirected-edges et * Pairs-nonadj et > V-structures et , 

Undirected-v-structures et used below are sets of possible modified edges of et . 
Undirected-v-structures et is the set of all subgraphs like x — y — z with x and z not 
adjacent in et\ Pairs— nonadj et is the set of all pairs of vertices that are 
nonadjacent in et . 

1 Set Oet empty set 

2 for each undirected edge x — y in Undirected-edgeSet do 

3 1^ consider operator DeleteU x — x, add it to Oet if dui.i holds, 

4 for each directed edge x —> y in Directed-edgeSet do 

5 consider operator DeleteD x ^ x, add it to Oet both ddi i and dd2 hold (dd2 is checked in 
Algorithm 1.1.3) 

6 for each v-structure x z <— y in V-structureSet do 

7 1^ consider operator RemoveV Xfe — <— xj, add it to Oet i"vi, rv2 and rvs hold, 

8 for each undirected v-structure x — z — y in Undirected-v-structureSet do 

9 1^ consider operator MakeV xj. — > x;, add it to Oet 'f mvi.i holds, 

10 if net < " (i.e., iui or idi holds) then 

11 for each pair {x, y) in Pairs-nonadjet do 

12 consider operator InsertU x — y, add it to Oet ''^ii iu2.i, iu2.2 and iu3 hold (ius is checked 
in Algorithm 1.1.1); 

13 consider InsertD x y, add it to Oet, if idi, id2.i, id2.2, id2.3, and ids hold (ids is checked 
in Algorithm 1.1.2); 

14 consider InsertD x <— y, add it to Oet if ^'^i, id2.i, id2.2i id2.3, and ids hold. 

15 return Oet 



All conditions in Algorithm 1.1 can be classified into four groups: (1) whether two vertex 
sets are equal or not (iu2.i, id2.i, rvi, and rv2), (2) whether a subgraph is a clique or 
not(dui 1, id2.2, and ddi i), (3) whether all partially directed paths or all undirected paths 



Y.B. He et al. /Reversible MCMC on markov equivalence classes of sparse DAGs 



15 



between two vertices contain at least one vertex in a given set (iu2.2, id2.3, mvi i and rvs), 
and (4) others (ius, ids, and dd2). The conditions in the first three groups can be tested 
via classical graph algorithms; we briefly review these below. 

Checking the conditions in the first two groups is trivial and very efficient, because the 
sets involved are small for most completed PDAGs in 5^ when n is of the same order of p. 
To check the conditions in the third group, we just need to check whether there is a partially 
directed path or undirected path between two given vertices not through any vertices in the 
given set. We check this using a depth- first search from the source vertex. When looking for 
an undirected path, we can search within the corresponding chain component that includes 
both the source and the destination vertex. 

The conditions ius, ids and dd2 in the fourth group depend on both et and the resulting 
completed PDAGs of the operators. Intuitively, checking these three conditions requires that 
we obtain the corresponding resulting completed PDAGs. We know that the time complexity 
of getting a resulting completed PDAG of et is 0{pnej [6, 9], where is the number of 
edges in et- To avoid generating resulting completed PDAG, we provide three algorithms to 
check iu3, ids and dd2 respectively. 

In these three algorithms, we use the concept of strongly protected edges, defined in 
Definition 2. Let Ay contain all vertices adjacent to v. To check whether a directed edge 
V ^ u is strongly protected or not in a graph Q, from Definition 2, we need to check whether 
one of the four configurations in Figure 1 occurs in Q. This can be implemented by local 
search in A„ and A„. We know that when a PDAG is sparse, in general, these sets are small, 
so it is very efficient to check whether an edge is "strongly protected" . 

We are now ready to provide Algorithm 1.1.1, Algorithm 1.1.2, and Algorithm 1.1.3 to 
check iu3, ids and dd2 only based on et, respectively. In these three algorithms, we just 
need to check whether a few directed edges are strongly protected or not in Vt+i, which has 
only one or a few edges different from et- We prove in Theorem 2 that these three algorithms 
are equivalent to checking conditions iu3, ids and dd2, respectively. 



Algorithm 1.1.1: Check the condition iu3 in Definition 9 

Input; a completed PDAG et and a valid operator on it: InsertU x — y. 
Output: True or False 

1 Insert x — y to et, get the modified PDAG denoted as "Pt+l, 

2 for each common child u of x and y in Vt+i do 

3 if either x u or y u is not strongly protected in Vt+i then 

4 1^ return False 

5 return True (iu^ holds for InsertU x — y) 



Theorem 2 (Correctness of Algorithms 1.1.1, 1.1.2 and 1.1.3). Let et he a completed PDAG. 
We have the following results. 

(i) Let InsertU x ~ y he any valid operator of et, then condition iu^ holds for the operator 

InsertU x — y if and only if the output of Algorithm 1.1.1 is True. 

(ii) Let InsertD x ^ y he any valid operator of et, then condition id^, holds for the operator 
InsertD x ^ y if and only if the output of Algorithm 1.1.2 is True. 
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Algorithm 1.1.2: Check the condition ids in Definition 9 

Input; a completed PDAG et and a valid operator; InsertD x y. 
Output; True or False 

1 Insert x y to et, get a PDAG, denoted as Vo, 

2 for each undirected edge u — y in Vo, where u is not adjacent to x do 

3 1^ update Vo by orienting u — y to y u, 

4 for each edge v —> y in Vo do 

5 if V y is not strongly protected in Vo then 

6 1^ update Vo by changing v y to v — y, 

7 Set Vt+i = Vo 

8 for each common child u of x and y in Vt+i do 

9 if y u is not strongly protected in Vt+l then 

10 1^ return False 

11 return True (id-i holds for InsertD x y) 



Algorithm 1.1.3: Check the condition dd2 in Definition 9 

Input; a completed PDAG et and a valid operator DeleteD x y 
Output; True or False 

1 Delete x y from et, get a PDAG, denoted as Vt+l', 

2 for each parent v of y in Vt+i do 

3 if V y is not strongly protected in Vt+i then 

4 1^ return False 

5 return True (dd2 holds for DeleteD x y) 



(iii) Let DeleteD x ^ y he any valid operator of et, then condition dd2 holds for the operator 
DeleteD x ^ y if and only if the output of Algorithm 1.1.3 is True. 

In Tlieorem 2, we show that an algorithm (Algorithm 1.1.1, Algorithm 1.1.2, or Algorithm 
1.1.3) returns True for an operator if and only if the corresponding condition (ius, ids or 
dd2) holds for the operator. Theorem 2 says that we do not have to examine the resulting 
completed PDAG to check conditions ius, ids and dd2, which saves much computation time. 

3.2.2. Time complexity of Algorithm 1 and an accelerated version 

We now discuss the time complexity of Algorithm 1. For et G S^, let p and nt be the number 
of vertices and edges in et respctively, fcj be the number of v-structures in , and k[ be the 
number of undirected v-structures (subgraphs x — y — z with x and z nonadjacent) in et- To 
construct Oet , in Step A of Algorithm 1 (equivalent. Algorithm 1.1), all possible operators we 
need to go through include: ut deleting operators (DeleteU and DeleteD), 3(p(p— l)/2 — nj) 
inserting operators (InsertU and InsertD) when the number of edges in et is less than n, kt 
RemoveV operators and k[ MakeV operators. There are at most Qt = 1.5p(p— 1)— 2nt+fct+fcf 
possible operators for et- Among all conditions in Algorithm 1.1, the most time-consuming 
one, which takes time 0{p + nt) [6], is to look for a path via the depth-first search for an 
operator with type of InsertD. We have that the time complexity of constructing Oet in 
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Algorithm 1.1 is 0{Qt{p + rif)) in the worst case and the time complexity of Algorithm 1 is 
OiJ^tLiQtiP + in the worst case, where N is the length of Markov chain in Algorithm 
1. We know that kt and k[ reach the maxima {p — 2)/2 * floor{p/2) * ceil{p/2) when 
is a evenly divided complete bipartite graphs [14]. Consequently, the time complexity of 
Algorithm 1 are 0{Np'^) in the worst case. Fortunately, when n is a few times of p, say 
n = 2p, all completed PDAGs in Sp are sparse and our experiments show kt and k^ are 
much less than 0{p^) for most completed PDAGs in Markov chain {et}t=i,- - .n- Hence the 
time complexity of Algorithm 1 is approximate 0(Np'^) on average when n is a few times of 
P- 

We can implement Algorithm 1 efficiently when p is not large (less or around 100 in 
our experiments). However, when p is larger, we need large N to guarantee the estimates 
reach convergence. Experiments in Section 4 show N = 10^ is suitable. In this case, cubic 
complexity {0{Np^)) of Algorithm 1 is unacceptable. We need to speed up the algorithms 
for a very large p. 

Notice that in Algorithm 1, we obtain an irreducible and reversible Markov chain {e*} 
and a sequence of numbers {Mt} by checking all possible operators on each ej. The sequence 
{Mt} are used to compute the stationary probabilities of {et} according to Proposition 1. We 
now introduce an accelerated version of Algorithm 1 to generate irreducible and reversible 
Markov chains on S^. The basic idea is that we do not check all possible operators but check 
some random samples. These random samples are then used to estimate {Mt}. 

We first explain some notation used in the accelerated version. For each completed PDAG 
et, if net < Oe""'' is the set of all possible operators on e* with types of InsertU, DeleteU, 
InsertD, DeleteD, MakeV, and RemoveV. If n^^ = n, the number of edges in reaches the 

upper bound n, no more edges can be inserted into et- Let be the set of operators 

obtained by removing operators with types of InsertU and InsertD from Oe""-* . oi^ 
is the set of all possible operators on et when n^^ = n. We can obtain oif^^ and oi;'"^""''*^ 
easily via all possible modified edges introduced in Algorithm 1.1. The accelerated version 
of Algorithm 1 is shown in Algorithm 2. 

In Algorithm 2, (either Oe""'' or oi^ jg the set of all possible operators on 

et, a € (0,1] is an acceleration parameter that determine how many operators in are 

checked, oit"'''^ is a set of checked operators that are randomly sampled without replace- 
ment from Og , and Oet is the set of all perfect operators in 0^e^'^'^^\ When a — 1, Oet — Cet 
and algorithm 2 becomes back to Algorithm 1. 

In Algorithm 2, because the operators in are i.i.d. sampled from O^^ in Step A' and 
operator o is chosen uniformly from Oe^ in Step C', clearly, o is also chosen uniformly from 
Cet- We have that the following Corollary 1 holds according to Proposition 1. 

Corollary 1 (Stationary distribution of {et} on S^). Let , defined in Equation (3.1), 
he the set of completed PDAGs with p vertices and maximum n of edges, Oet, defined in 
Equation (3.2), be the set of operators on Ct, and Mt be the number of operators in O^^ ■ For 
the Markov chain {et} on Sp obtained via Algorithm 1 or Algorithm 2, then 

1. the Markov chain {et} is irreducible and reversible; 

2. the Markov chain {et} has a unique stationary distribution tt and T:{et) oc Mt- 

In Algorithm 2, we provide an estimate of Mt instead of calculating it exactly in Algorithm 
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Algorithm 2: An accelerated version of Algorithm 1. 



Input; 

a S (0, l]:an acceleration parameter; p, n and N , the same as input in Algorithm 1 
Output: 

{et, Mt}t=i ■■■ ,JVi where Mt is an estimation of Mt = {Oe^l 

1 Initialize eo as any completed PDAG in cSp ; 

2 for t <- to do 
Step A': 

if net < " then 



Set O' 



(aii) 



else 



Set O' 



( — insert) 
et 



Set mt = \0'et I 

Randomly sample [amt] operators without replacement from O'^^ to generate a set (p'^'"^'^'') ^ 
where [amt] is the integer closest to amt- 

Check all operators in a,nd choose those satisfying the corresponding conditions in 

Definition 9 to construct a set of operators Oet • 

/* The detailed to check conditions can be found in Section 3.2.1 */ 
Set m*'^' = |det |. If m[°^ = 0, goto Line 9. 



AO) 



Step B': 

I 

L =™* In- 

step C: 

1^ Randomly choose an operator o uniformly from O^t ■ 
Step D: 

1^ Apply operator o to et . Set et+i as the resulting completed PDAG of o. 
18 return {et, Mt}t=i,--- ,N ■ 



1. Let jOg^ I = mt , I — [amt], and \Oet \ — mf^ . Clearly, the ratio ml"' /[amt] is an 

unbiased estimator of the population proportion Mt/rrit via sampling without replacement. 
We can estimate Alt = ICed in Step B' as 

(O) 

Mt = mt^^. (3.4) 

[amt\ 

We have that when [amt] is large, the estimator Mt has an approximate normal distri- 
bution with mean equal to Mt — |CeJ. 

Let the random variable u be uniformly distributed on S^, f{u) be a real function de- 
scribing a property of interest of u, and A be a subset of M. By replacing Mt with Mt in 
Equation (2.6), we estimate P^r ({/(u) G A}) via {et, Mt]t=i,--- ,n as follows. 



K (/(«) e A) = (3-5) 

where Pat (/(u) e A) is defined in Equation (2.5). 
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In the accelerated version, only 100q% of all possible operators on are checked. In Sec- 
tion 4, our experiments on S\q^ show that the accelerated version can speed up the approach 
nearly ^ times, and that Equation (3.5) provide almost the same results as Equation (2.6) 
in which {et, Mt\t=i,--- ,N from Algorithm 1 are used. Roughly speaking, if we set a — 1/p, 
the time complexity of our accelerated version can reduce to 0{Np^). 

4. Experiments 

In this section, we conduct experiments to illustrate the reversible Markov chains proposed 
in this paper and their applications for studying Markov equivalence classes. The main points 
obtained from these experiments are as follows. 

1. For Sp with small p, our proposed approach is thousands of times faster than available 
methods for studying sets of Markov equivalence classes in the literature, such as 
those proposed by Gillespie and Perlman [14], or Pena [30]. For Sp with large p (up 
to 1000), the accelerated version of our proposed approach is also very efficient and 
the estimations in Equations (2.6) and (3.5) converge quickly as the length of Markov 
chain increases. 

2. For completed PDAGs in Sp with sparsity constraint (n is a small multiple of p), we see 
that (i) most edges are directed, (ii) the sizes of maximum chain components (measured 
by the number of vertices) are very small (around ten) even for large p (around 1000), 
and (iii) the number of chain components grows approximately linearly with p. 

These results imply that, under the assumption that the underlying completed PDAG is 
sparse, and that there are no latent or selection variables present, causal inference based on 
observational data is sufficient to recover most causal relationships. Moreover, under these 
assumptions, most chain components are small, so in general few interventions are needed 
to infer the directions of the undirected edges. 

In Section 4.1, we evaluate our methods by comparing the size distributions of Markov 
equivalence classes in Sp with small p to true distributions {p — 3, 4) or Gillispie's results 
{p = 6) [14]. In Section 4.2, we report the proportion of directed edges and the properties of 
chain components of Markov equivalence classes under sparsity constraints. In Section 4.3, 
we show experimentally that Algorithm 2 is much faster than Algorithm 1, and that the 
difference in the estimates obtained is small. Finally, we study the asymptotic properties of 
our proposed estimators in Section 4.4. 

4-1. Size distributions of Markov Equivalence classes in Sp for small p 

We consider size distributions of completed PDAGs in Sp for p = 3, 4, and 6 respectively. 
There are 11 Markov equivalence classes in S3, and 185 Markov equivalence classes in S4. 
Here we can get the true size distributions for S3 and 54 by listing all the Markov equivalence 
classes and calculating the size of each explicitly. Gillespie and Perlman calculate the true 
size probabilities for Sq by listing all classes; these are denoted as GP-values. We estimate the 
size probabilities via Equation (2.6) with the Markov chains from Algorithm 1. We ran ten 
independent Markov Chains using Algorithm 1 to calculate the mean and standard deviation 
of each estimate. The results are shown in Table 2, where N is the sample size (length of 
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Markov chain). We can see that the means are very close to true values or GP- values and 
the standard deviations are also very small. 

Table 2 

Size distributions for Sp with p = 3,4, and 6 respectively. N is sample size, k is the average time 
(seconds) used to generate a completed PDAG, GP-values are obtained by Gillispie and Perlman [14]- 



P=3, N 


= 10", K = 


2 X 10- 


*sec 


p=4, Af=10-*, K = 


3 X 10— *sec 


Size True value 


Mean(Std) 


Size 


TruG valuG 


Mean(Std) 


1 


0.36363 


0.36422(0.00540) 


1 


V./ • _L U C ij 


0.31859(0.00946) 


2 


0.27273 


0.27160(0.00412) 


2 


0.25946 


0.25929(0.00590) 


3 


0.27273 


0.27274(0.00217) 


3 


0.19460 


0.19572(0.00635) 


6 


0.0909 


0.09144(0.00262) 


4 


0.10270 


0.10229(0.00395) 










6 


0.02162 


0.02162(0.00145) 










8 


0.06486 


0.06464(0.00291) 










10 


0.03243 


0.03249(0.00202) 










24 


0.00540 


0.00536(0.00078) 




p=6, N= 


= 10^ K = 


= 6 X lO-^sec 










Size GP-value 


Mean(Std) 


Size 


GP-value 


Mean(Std) 




1 


0.28667 


0.28588(0.00393) 


48 


0.00013 


0.00013(0.00004) 




2 


0.25858 


0.25897(0.00299) 


50 


0.00034 


0.00034(0.00007) 




3 


0.17064 


0.17078(0.00248) 


52 


0.00017 


0.00018(0.00003) 










54 


0.00017 


0.00018(0.00004) 




28 


0.00017 


0.00017(0.00004) 


60 


0.00019 


0.00020(0.00004) 




30 


0.00169 


0.00170(0.00017) 


72 


0.00006 


0.00006(0.00002) 




32 


0.00236 


0.00238(0.00017) 


88 


0.00004 


0.00004(0.00001) 




36 


0.00052 


0.00053(0.00008) 


144 


0.00009 


0.00009(0.00003) 




38 


0.00034 


0.00035(0.00004) 


156 


0.00006 


0.00006(0.00003) 




40 


0.00118 


0.00120(0.00010) 


216 


0.00001 


0.00001(0.00002) 




42 


0.00051 


0.00052(0.00009) 









We implemented our proposed method in Python, and ran it on a computer with a 
2.6 GHZ processor. The average time used to generate a completed PDAG, denoted as 
K, is also reported in Table 2. We can see that the average times (k) for S^, S4, and Sq 
are 2 x 10~^, 3 x 10^'' and 6 x 10~* seconds respectively. In comparison, Pena's method, 
implemented in C++ on a 2.4 GHZ computer, takes 1.872, 2.448, and 3.384 seconds on 
average to generate a sample for p = 3,4, and 6 respectively. 

4-2. Markov equivalence classes with sparsity constraints 

We now study the sets Sp of Markov equivalence classes defined in Equation (3.1). The 
number of vertices p is set to 100, 200, 500 or 1000 and the maximum edge constraint n is 
set to rp where r is the ratio of n to p. For each p, we consider three ratios: 1.2, 1.5 and 3. 
The completed PDAGs in Sp^ are sparse since r < 3. Define the size of a chain component as 
the number of vertices it contains. In this section, we report four distributions for completed 
PDAGs in Sp^: the distribution of proportions of directed edges, the distribution of the 
numbers of chain components, the distribution of the maximum size of chain components, 
and the distribution of the numbers of v-structures. In each simulation, given p and r, 
a Markov chain with length of 10^ on 31,^ is generated via Algorithm 2 to estimate the 
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distributions via Equation (3.5). The acceleration parameter a is set to 0.1,0.05,0.01 and 
0.001 for p = 100, 200, 500 and 1000 respectively. 

In Figure 5, twelve distributions of proportions of directed edges are reported for S^^ with 
different p and ratio r. We mark the minimums, 5% quartiles (Solid circles below boxes), 
1st quartiles, medians, 3rd quartiles and maximums of these distributions. We can see that 
for a fixed p, the proportion of directed edges increases with the number of edges in the 
completed PDAG. For example, when the ratio r=1.2, the medians (red lines in boxes) of 
proportions are near 92%; when the ratio r=1.5, the medians are near 95%; when ratio r=3, 
the medians are near 98%. 







* 


- \ h T ^ 








; 1 i . 








1 * 1 

_i_ 












p=100 p=200 p=500 p=1000 



1.2 1.5 



1.2 1.5 



1.2 1.5 



1.2 1.5 



Fig 5. Distribution of proportion of directed edges in completed PDAGs in Sp^ . The lines in the boxes and 
the solid circles under the boxes indicate the medians and the 5% quartiles respectively. 



The distributions of the numbers of chain components of completed PDAGs in Sp^ are 
shown in Figure 6. We plot the distributions for Sp'^^ in the main window and the distribu- 
tions for r=1.2 and r—3 in two sub- windows. We can see that the medians of the numbers 
of chain components are close to 5, 10 ,20, and 40 for completed PDAGs in 5^'^^ with p = 
100, 200, 500 and 1000 respectively. It seems that there is a linear relationship between the 
number of chain components and the number of vertices p. In the insets, similar results are 
shown in the distributions for r=1.2 and r=3. 

The distributions of the maximum sizes of chain components of completed PDAGs in Sp^ 
are shown in Figure 7. For S^^^ in the main window, the medians of the four distributions 
are approximately 4,5,6 and 7 for p—100, 200, 500, and 1000 respectively. This shows that 
the maximum size of chain components in a competed PDAG increases very slowly with 
p. In particular, from the 95% quartiles (solid circles above boxes), we can see that the 
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60 




p=100 p=200 p=500 p=1000 

r=1.5 



Fig 6. Distributions of numbers of chain components of completed PDAGs in Sp^ . The lines in the boxes 
and the solid circles above the boxes indicate the medians and the 95% quartiles respectively. 



maximum chain components of more than 95% completed PDAGs in Sp^^^ have at most 8, 
9, 10, and 13 vertices for p — 100, 200, 500, and 1000 respectively. This result implies that 
sizes of chain components in most sparse completed PDAGs are small. 

The distributions of the numbers of v-structures of completed PDAGs in S^^ are shown 
in Figure 8. For S^'^^ in the main window, the medians of the four distributions are 108, 220, 
557 and 1110 for p=100, 200, 500, and 1000 respectively. Figure 8 shows that the numbers 
of v-structures are much less than {jp') for most completed PDAGs in when r is set 
to 1.2. 1.5 or 3. This result is useful to analyze the time complexities of Algorithm 1 and 
Algorithm 1.1 in Section 3.2.2. 

4-3. Comparisons between Algorithm 1 and its accelerated version 

In this section, we show experimentally that the accelerated version Algorithm 2 is much 
faster than Algorithm 1 and the difference of estimates based on two algorithms is small. 
We have estimated four distributions on S\qq in Section 4.2 via Algorithm 2. The four 
distributions are the distribution of proportions of directed edges, the distribution of the 
numbers of chain components, the distribution of maximum size of chain components, and 
the distribution of the numbers of v-structures. To compare Algorithm 1 with Algorithm 2, 
we re-estimate these four distributions for completed PDAGs in S\qq via Algorithm 1. 

For each distribution, in Figure 9, we report the estimates obtained by Algorithm 1 
with lines and the estimates obtained by Algorithm 2 with points in the main windows. 
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Fig 7. The distributions of the maximum sizes of chain components of completed PDAGs in Sp^. The lines 
in the boxes and the solid circles above the boxes indicate the medians and the 95% quartiles respectively. 



The differences of two estimates are shown in the sub-windows. The top panel of Figure 9 
displays the cumulative distributions of proportions of directed edges. The second panel of 
this figure displays the distributions of the numbers of chain components. The third panel 
displays the distributions of maximum size of chain components. The bottom panel displays 
the distribution of the numbers of v-structures. We can see that the differences of three pairs 
of estimates are small. 

The average times used to generate a state of the Markov chain of completed PDAGs 
in Sp^P are shown in Table 3, in which a is the acceleration parameter used in Algorithm 
2. If a = 1, the Markov chain is generated via Algorithm 1. The results suggest that the 
accelerated version can speed up the approach nearly ^ times when p — 100. 

Table 3 

The average time used to generate a completed PDAG in cSp where p is the number of vertices, a is the 
acceleration parameter, k is the average time (seconds). 



p 100 100 200 500 1000 

a 1 0.1 0.05 0.01 0.001 

K (seconds) 0.22 0.032 0.113 0.28 0.72 
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Fig 8. The distributions of the numbers of v-structures of completed PDAGs in Sp^ . The red lines in the 
boxes indicate the medians. 



4.4- Asymptotic properties of proposed estimators 

We further illustrate the asymptotic properties of proposed estimators of sparse completed 
PDAGs via simulation studies. We consider Sp^^ for p = 100, 200, 500 and 1000 respectively. 
Let f{u) be a discrete function of Markov equivalence class u, where u is a random variable 
distributed uniformly in Sp^^. Let E(/) be the expectation of f{u), we have 

E(/)=^zP(/-z). 

i 

Proposition 2 shows that the estimator P(/ = i) in Equation (2.6) converges to P(/ — i) 
with probability one. We also have that the estimator defined as 



E(./) = 



converges to E(/) with probability one, where {et, Mt}t=i... is a Markov chain from Al- 
gorithm 1. 

We generate some sequences of Markov equivalence classes {et, Mt} with length oi N = 
1.25 X 10^ via Algorithm 2 and divide each sequence into 250 blocks. Set f{u) to be the size 
of the Markov equivalence class represented by u or the proportion of directed edges in u. 



Y.B. He et al. /Reversible MCMC on markov equivalence classes of sparse DAGs 



25 





Fig 9. Distributions for completed PDAGs in iSjqq estimated via Algorithm 1 (plotted in lines) and the 
accelerated version-Algorithm 2 (plotted in points) are shown in the main windows. The differences are 
shown in sub-windows. Four panels (from top to bottom) display distributions of directed edges, number of 
chain components, maximum size of chain components, and v-structures respectively. 



we estimate E(/) using cumulative data in the first k blocks as 

E(/)fc = 



/ t=i 



where j = 5 x 10'^. The simulation results are shown in Figure 10 and Figure 11. We can see 
that the estimates of both average sizes and proportions of directed edges converge quickly 
as k increases. 
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Average size of Markov equivalence classes 




Fig 10. Four sequences of average sizes of completed PDAGs in Sp' ^ with p = 100,200,500 and 1000, 
estimated via Algorithm 2 and the first 5000/c steps of the Markov chains, where k is shown in x-axis. 



Average proportion of directed edges 

.944 f 




250 



250 



250 



50 100 150 200 250 

k 

Fig 11. Four sequences of average proportions of directed edges in completed PDAGs in Sp'^^ with p = 
100, 200, 500 and 1000, estimated via Algorithm 2 and the first 5000fc steps of the Markov chains, where k 
is shown in x-axis. 
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5. Conclusions 

In this paper, we proposed a reversible irreducible Markov chain on Markov equivalence 
classes that can be used to study various properties of a given set of interesting Markov 
equivalence classes. Our experiments on Markov equivalence classes with sparse constraints 
reveal useful information. For examples, we find that proportions of undirected edges and 
chain components in sparse completed PDAGs are small even for Markov equivalence classes 
with thousands of vertices. 

The sizes of Markov equivalence classes are the property most widely discussed in the 
literature. Due to space constraints, we have omitted several details in this paper about 
determining the size of Markov equivalence classes and calculating further properties of edges 
and vertices. We will discuss this issues in a follow-up paper. The proposed methods can 
potentially be extended to study other sets of completed PDAGs besides 5,^. Some interesting 
sets include (1) the completed PDAGs in which each vertex has at most d adjacent edges; 
(2) completed PDAGs in which each pair of vertices is connected by a path along edges in 
the graph. 
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Appendix A: Preliminary results and algorithms 

In this appendix, we provide some preliminary results and algorithms introduced by Anders- 
son [2], Dor and Tarsi [9], and Chickering [5, 6]. Some definitions and notation are introduced 
first. A graph is called a chain graph if it contains no partially directed cycles [20]. A chord 
of a cycle is an edge that joins two nonadjacent vertices in the cycle. An undirected graph is 
chordal if every cycle of length greater than or equal to 4 possesses a chord. A directed edge 
of a DAG is compelled if it occurs in the corresponding completed PDAG, otherwise, the 
directed edge is reversible and the corresponding parents are reversible parents. The concept 
of strongly protected is introduced in Definition 2 in Section 3.2. Recall that is the set 
of all neighbors of x, is the set of all parent of a;, N^y = n Ny and Vl^^y = Tix{~] Ny. 
The other notation can be found in the beginning of Section 3.1. 

Lemma 2 characterizes completed PDAGs that are used to represent Markov equivalence 
classes [2]. This Lemma was mentioned in Section 1.1 and will be used in the proofs in 
Appendix B. 

Lemma 2 (Anderssen [2]). A graph C is a completed PDAG of a directed acyclic graph T) 
if and only if C satisfies the following properties: 

(i) C is a chain graph; 

(ii) Let Cr he the subgraph induced by t . Ct is chordal for every chain component r; 

(iii) w ^ u — V does not occur as an induced subgraph of C; 

(iv) Every arrow v ^ u in C is strongly protected. 

Algorithm 3 generates a consistent extension of a PDAG [9]. Algorithm 4 creates the 
corresponding completed PDAG of a DAG [5]. They are used to implement Chickering's 
approach. 

Lemma 3 shows the equivalent validity conditions for iu2, dui, id2, ddi and mvi used 
in Definition 9 [6]. In Definition 3, we have introduced the concept of "the validity of an 
operator" introduced by Chickering [6]. 

Lemma 3 (Validity conditions of some operators [6]). The necessary and sufficient validity 
conditions of the operators with type of InsertU, DeleteU, InsertD, DeleteD or MakeV are 
as follows. 

• (InsertU) Let x and y be two vertices that are not adjacent in C. The operator InsertU 
X — y is valid (equivalently, iu2 holds) if and only if (iu2.i) lix = ^y, (iu2.2) every 
undirected path from x to y contains a vertex in N^y ■ 

• (DeleteU) Let x — y be an undirected edge in completed PDAG C. The operator DeleteU 
X — y is valid (equivalently, dui holds) if and only if (dui.i) Nxy is a clique in C. 
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Algorithm 3: (Dor and Tarsi [9]) Generate a consistent extension of a PDAG 

Input; A PDAG P that admits a consistent extension 
Output: A DAG "D that is a consistent extension of V. 

1 Let V := V; 

2 while V is not empty do 
Select a vertex x in V such that (1) x has no outgoing edges and (2) if Nx is not empty, then 

every vertex in Nx is adjacent to all vertices in Nx U Tlx- I* Dor and Tarsi [9] show that a 
vertex x with these properties is guaranteed to exist if V admits a consistent 
extension. */ 
Let all undirected edges adjacent to x be directed toward x in T) 
Remove x and all incident edges from V. 

6 return T) 



• (InsertD) Let x and y be two vertices that are not adjacent in C. The operator InsertD 
X ^ y is valid (equivalently, id2 holds) if and only if (id2.i) 7^ Hj,, (id2.2) ^x.y is 
a clique, (id2.3) every partially directed path from y to x contains at least one vertex 

• (DeleteD) Let x ^ y be a directed edge in completed PDAG C. The operator DeleteD 
of X ^ y is valid (equivalently, ddi holds) if and only j/(ddi.i) Ny is a clique. 

• (MakeV) Let x — z — y be any length-two undirected path in C such that x and y are 
not adjacent. The operator Make V x ^ z y is valid ( equivalently, mvi holds ) if and 
only if (mvi.i) every undirected path between x and y contains a vertex in N^y. 

Appendix B: Proofs of theorems in Section 3 

We will provide a proof of Theorem 1 in Appendix B.l below about a perfect operator set, 
and a proof of Theorem 2 in Appendix B.2 below about the correctness of Algorithm 1.1.1, 
Algorithm 1.1.2 and Algorithm 1.1.3 introduced in Section 3.2. 



Appendix B.l: Proof of Theorem 1 

Let O be the operator set defined in Equation (3.3); to prove Theorem 1, which shows O is a 
perfect operator set, we need to show O satisfies four properties: validity, distinguishability, 
irreducibility and reversibility. Equivalently, we just need to prove Theorem 3-6 as follows: 

Theorem 3. The operator set O is valid. 

Theorem 4. The operator set O is distinguishable. 

Theorem 5. The operator set O is reversible. 

Theorem 6. The operator set O is irreducible. 

Of the above four theorems, the most important and difficult is to prove Theorem 5. We 
now show the proofs one by one. 
Proof of Theorem 3 
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Algorithm 4: (Chickering [5]) Create the completed PDAG of a DAG 

Input; V, a DAG 

Output: The completed PDAG C of DAG V. 

1 Perform a topological sort on the vertices in "D such that for any pair of vertices x and y in D, x must 
precede y if x is an ancestor of y\ 

2 Sort the edges first in ascending order for incident vertices and then in descending order for outgoing 
vertices; Label every edge in D as unknown; 

3 while there are edges labeled unknown in V do 

4 Let X y he the lowest ordered edge that is labeled unknown 

5 for every edge w x labeled compelled do 

6 if w is not a parent of y then 

7 X y and every edge incident into y with compelled 

8 Goto 3 

9 else 

10 ^ Label w y with compelled 

11 if there exists an edge z y such that z = x and z is not a parent of x then 

12 1^ Label x y and all unknown edges incident into y with compelled 

13 else 

14 1^ Label x ^ y and all unknown edges incident into y with reversible 

15 Let C = 'D and undirect all edges labeled "reversible" in C. 

16 return completed PDAG C 



According to the definition of validity in Definition 5 and the definition of Oc in Equation 
(3.2), all operators in InsertUc, DeleteUc, InsertDc, DeleteDc and MakeVc are valid. We 
just need to prove Lemma 4, which shows all operators in RemoveVc are valid. 

Lemma 4. Let x ^ z y be a v-structure in completed PDAG C. If (rvi) XI^^ — Ily, (rv2) 
UxUNxy = Ilz\{x, y}, and (rvs) every undirected path between x and y contains a vertex in 
Nrcy hold, then the operator Remove V x z ^ y is valid and results in a completed PDA G 
in Sp defined in Equation (3.1). 

To prove Lemma 4, we will use Lemma 5 given by Chickering (Lemma 32 in [6]). 

Lemma 5. Let C be any completed PDAG, and let x and y be any pair of vertices that are 
not adjacent. Every undirected path between x and y passes through a vertex in N^y if and 
only if there exists a consistent extension in which (1) x has no reversible parents, (2) all 
vertices in N^y are parents of y, and (3) y has no other reversible parents. 

We now give a proof of Lemma 4. 

Proof. From Lemma 5 and condition rva in 4, there exists a consistent extension of T) in 
which x has no reversible parents and the reversible parents of y are the vertices in N^y. 
Because y ^ z occurs in the completed PDAG C, Nz and Ny occur in different chain 
components. We can orient the undirected edges adjacent to z out of z. Then all vertices in 
Nz are children of z in V. Let V be the graph obtained by reversing y ^ z in T) and V 
be the PDAG obtained by applying the RemoveV operator to C. We will show that V is a 
consistent extension of V' . 

Clearly, V and V' have the same skeleton. 
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We have that any v-structure that occurs in T) but not in V' must include either the edge 
x — >■ z or ?/ — >■ z. Since I? is a consistent extension of C, we have that all v-structures in T) 
are also in C. From condition rv2, all parents of z other than x and y are adjacent to x and 
y. Hence a; — > z y is the only v-structure that is directed into z in C. We have that all 
v-structures of V' are also in I?, and there is only one v-structure x ^ z ^ y that is in V 
but not v. 

Since y — z is the unique edge that differs between V and V , we have that any v- 
structure that exists in V but not in V must include the edge y ^ z, and any v-structure 
that exists in V but not in V must include the edge z y. We have shown that x —i' z ^ y 
is the only v-structure in V that is directed into z. From the construction of V, we have 
that all compelled parents of y in V are also parents of z and all other parents are in N^y; 
from rv2, they also are parents of z. There is no v-structure that includes edge z — > y in 
V . Hence, all v-structures of V are also in D, and there is only one v-structure x ^ z ^ y 
that is in I? but not V . 

Hence, V and V have the same v-structures. It remains to be shown that V is acyclic. 

If V contains a cycle, the cycle must contain the edges z ^ y because T> is acyclic. This 
implies there is a directed path from y to z in V. By construction, all vertices in are 
children of z in I?'. So, this path must include a compelled parent of z, denote it by u. If 
u X, from condition rv2, u G Hy [J N^y] by the construction of V, we have u G Uy. Thus, 
there is no path from y to z that contains u. li u = x, by construction, the path must 
contain a compelled parent v of x. From condition rvi, v £ Uy. Thus, there is no path from 
y to z contains v. We get that V is acyclic. Thus V is a consistent extension of V and the 
operator Remove V x — > z -s— y is valid. □ 

Proof of Theorem 4 

Proof. For any completed C S 5^ , we need to show that different operators in Oc result 
in different completed PDAGs. For any valid operator o S InsertUc, say InsertU x — y, 
denoted as o, the resulting completed PDAG of o contains the undirected edge x — y. We 
have that all other operators in Oc except for InsertD x y and Insert a; ^ y (if they 
are also valid) will result in completed PDAGs with skeletons different than the resulting 
completed PDAG of o. Thus, these operators can not result in the same completed PDAG 
as a. If InsertD a; — > y or Insert a; ^ y is valid, the resulting completed PDAGs of them 
contain a; — y or a; 4— y. These two resulting completed PDAGs have at least a compelled 
edge different than the resulting completed PDAG of a. Thus, there is no operator in Oc 
that can result in the same completed PDAG as o. 

Similarly, we can show for any operator in Oc, different operators will result in dif- 
ferent completed PDAGs, because they will have distinct skeletons, compelled edges, or 
v-structures. □ 

Proof of Theorem 5. 

Let C be any completed PDAG in S'p, o e Oc be an operator on C. The operator o' E O 
is the reversible operator of o if o' can transfer the resulting completed PDAG of a back 
to C. To prove Theorem 5, we just need to show each operator in Oc defined in Equation 
(3.3) has a reversible operator in O. Equivalently, we prove Lemma 6, Lemma 7, Lemma 
8, Lemma 9, Lemma 10, and Lemma 11 to show the reversibility for six types of operators 
respectively. 
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Lemma 6. For any operator a G Oq denoted by "InsertU x — y", the operator "DeleteU 
X — y" is the reversible operator of o. 

Lemma 7. For any operator a G Oc denoted by "DeleteU x — y", the operator "InsertU 
X — y" is the reversible operator of o. 

Lemma 8. For any operator a € Oc denoted by "InsertD x — >■ y", the operator "DeleteD 
X ^ y" is the reversible operator of a. 

Lemma 9. For any operator o G Oc denoted by "DeleteD a: — > y", the operator "InsertD 
X ^ y" is the reversible operator of o. 

Lemma 10. For any operator a G Oc denoted by "MakeV x ^ z ^ y'\ the operator 
"RemoveV X z y" is the reversible operator of o. 

Lemma 11. For any operator a G Oc denoted by "RemoveV x z ^ y", the operator 
"MakeV X —^z^y"is the reversible operator of a. 

Before giving proofs of these six lemmas, We first provide several results shown in Lemma 
12, 14, 13, and Lemma 15. 

Lemma 12. Let graph C be a completed PDAG, {w^v^u} be three vertices that are adjacent 
each other in C. If there are two undirected edges in {w,v,u}, then the third edge is also 
undirected. 

Proof. If the third edge is directed, there is a directed cycle like w — v — u^w. From Lemma 
2, we know that C is a chain graph, so there is no directed circle in C. □ 

Lemma 13. Let Ci be the resulting completed PDAG obtained by inserting a new edge 
between x and y in C. If there is at least one edge v ^ u that is directed in C but not directed 
in Ci , then there exists a vertex h that is common child of x and y such that x ^ h and 
y ^ h in C become undirected in Ci . 

Proof. According to Lemma 2, an edge is directed in a completed PDAG if and only if it 
is strongly protected. Thus, we have that at least one case among (a), (6), (c), (d) in Figure 
1 occurs in C but not in Ci for w — > u. We will show that either Lemma 13 holds or there 
exists a parent of u, denoted as ui, such that U2 — >■ ui occurs in C but not in Ci, where U2 
is a parent of Ui. We denote the latter result as (*). 

Suppose case (a) in Figure 1 occurs in C but not in Ci. Because v ^ u becomes undirected 
in Ci, we have that w ^ v must be undirected in Ci since w and u are not adjacent. Set 
ui = V and U2 — u, and we have that (*) holds. 

Suppose case (b) in Figure 1 occurs in C but not in Ci. If the pair {w, is not {x,y}, 
u — > u ^ w is a v-structure in C. We have that w — > w occurs in Ci. This is a contradiction. 
If {v, w} is {x, y}, we have that Lemma 13 holds {h — u). 

Suppose case (c) in Figure 1 occurs in C but not in Ci. Either v ^ w or w ^ u occurs 
in C but not in Ci. If it is u w, by setting U2 = v and ui = w, we have (*) holds. If it is 
w ^ u, both V — u and w — u in Ci, so x — u also must be in Ci. We also have that (*) holds. 

Suppose case (d) in Figure 1 occurs in C but not in Ci. If the pair {w,wi} is {x,y}, 
Lemma 13 holds (/i = u). Otherwise, w — >■ u must occur in both Ci and C and the 

edge ?; — ^ It is still strongly protected in Ci, yielding a contradiction. 
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If (*) holds, we have that there is a directed path U2 — >■ Ui — > u such that U2 — > ui occurs 
in C but not Ci. Iterating, we can get a directed path Uk — > u^-i • • • — t- u of length fc — 1 
without undirected edges such that Uk — > Wfe-i occurs in C but not in Ci if Lemma 13 does 
not hold in each step. Because C is a chain graph without directed circle, the procedure will 
stop in finite steps and Lemma 13 will hold eventually. □ 

From the proof of Lemma 13, we have that u should be a descendant of x and y, so we 
can get the following Lemma 14. 

Lemma 14. Let C be any completed PDAG, and let V denote the PDAG that results from 
adding a new edge between x and y. For any edge v ^ u in C that does not occur in the 
resulting completed PDAG extended from V , there is a directed path of length zero or more 
from both x and y to u in C. 

Lemma 15. Let InsertUc and DeleteUc be the operator sets defined in Definition 9 re- 
spectively. For any a in InsertUc or in DeleteUc, where V' is the modified graph of o that 
is obtained by applying o to C, we have that V' is a completed PDAG. 

Proof. We just need to check whether V' satisfies the four conditions in Lemma 2. 

(i) : For any o € DeleteUc, denoted as DeleteD x — y, let V be the modified graph 
obtained by deleting x — y from C. 

If there is a directed cycle in V' , it must be a directed cycle in C, which is a contradiction. 
Thus, there is no directed cycle in V' and V' is a chain graph. 

If there exists an undirected cycle of length greater than 3 without a chord in V' , the 
cycle must contain both x and y; otherwise, this cycle occurs in C. If the length of the cycle 
is 4, the other two vertices are in N^y] we have that the cycle has a chord since N^y is a 
clique in C. If the cycle in V' has length greater than 4 without a chord, we have that x — y 
is the unique chord of this cycle in C. However, this would imply that there is a cycle of 
length greater than 3 without a chord in C, a contradiction. Thus, there is no undirected 
cycle with length greater than 3 in V' , so every chain component of V' is chordal. 

Suppose that •—>• — • occurs as an induced subgraph of V'; it must be x — ?> • — y (or 
y — ?> ■ ~ x). However, in this case, a: — > • — y — a; (or y — > ■ ~ x — y) would be a directed cycle 
in C. Thus the induced subgraph like • — ?> does not occur as an induced subgraph of V' . 

Finally, all directed edges in V' will be strongly protected; by the definition of strong 
protection, all directed edges in C will remain strongly protected when an undirected edge 
is removed. 

(ii) : For any o S InsertUc, denoted as InsertU x — y, V' is the modified graph of o. 

If there is a directed cycle in V' , it must contain x — y, otherwise this cycle is also in 
C. We can suppose that there exists a partially directed path from x to y in C. Denote the 
adjacent vertex of y in the path as u. Let u be the vertex adjacent to y in the path. We have 
u ^ Ylyl otherwise, from the condition H^; = Ily in Lemma 3, u would also be in H^;, so there 
would be a partially directed cycle from a; to a; in C. Hence the directed path must have the 
form a; M — y. This would induce a subgraph like a — > 6 — u in C, a contradiction. 

Consequently, V' is a chain graph. 

If there exists an undirected cycle of length greater than 3 without a chord in V' , the cycle 
must contain x and y, and there must be an undirected path from a: to y in C; otherwise, 
the cycle would also be in C From Lemma 3, every undirected path from x to y contains a 
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vertex in N^y, so every undirected path of length greater than two has a chord. Thus, every 
undirected path of length greater than 3 from a; to y in V' has a chord. This implies that 
every chain component of V' is chordal. 

Suppose that a subgraph like • — > — occurs as an induced subgraph oiV' . Since Tlx — Hj, 
in C, the induced subgraph is not ■ ^ x — y (or • — > y — x). Thus, the induced subgraph 
like •—)•• — • also occurs in C. This is a contradict since C is a completed PDAG, yielding a 
contradiction. 

From Lemma 13 and the condition iua in Definition 9, all directed edges in C are also 
directed in Ci. This implies that all directed edges in V are still compelled, and are thus 
strongly protected. 

□ 

We now give proofs of Lemma 6, Lemma 7, Lemma 8, Lemma 9, Lemma 10, and Lemma 
11 one by one. 

Proof of Lemma 6 

Proof. Because the operator "InsertU a: — = o e Oc is valid and Ci is the resulting 
completed PDAG of o, we have that x — y occurs in Ci. We just need to show that the 
common neighbors of x and ?/, denoted as N^y, form a clique in Ci. 

If Nxy is empty set or has only one vertex, the condition that N^y is a clique in Ci holds. 

If there are two different vertices z,u € N^y in Ci, we have that x — z — y and x — u — y 
form a cycle of length of 4 in Ci. The cycle is also in C. Since the edge x — y does not exist 
in C and C is a completed PDAG in which all undirected subgraphs are chordal graphs, we 
have that z — u occurs in C, so z and u are adjacent in Ci. Hence the condition that Nxy is 
a clique in Ci holds. □ 

Proof of Lemma 7 

Proof. We need to show the operator o' := InsertU x — y, satisfies the conditions iui, iu2 
and iu3 in Definition 9 for completed PDAG Ci and that the resulting completed PDAG of 
a' is C. 

The condition iui clearly holds, since x — y exists in Ci but not in C. Lemma 15 implies 
that the graph obtained by deleting x — y from C is the completed PDAG Ci. Thus, the 
graph obtained by inserting x — y into Ci is C. This implies that InsertU a; — ?; is valid and 
the condition iu2 holds. 

Lemma 15 implies that the condition iua also holds. □ 

Proof of Lemma 8 

Proof. I will first show that there is no undirected edge y — w that occurs in both C and Ci . 
li w — y occurs in C, since x and y are not adjacent in C, a; — ?■ w — y does not occur in C. 
There are three possible configurations between x and w in C: (1) x is not adjacent to w, (2) 
w ^ X, and (3) x — w. If a; is not adjacent to w in C, inserting x ^ y will result 'm y ^ w 
in Ci . If w — >■ a; is in C, inserting a; — >■ y will result in w — > y in Ci. If a; — w in C, there is an 
undirected path from y to a;; that is, the first condition for InsertD to be valid, according to 
Lemma 3, does not hold. Thus, we get that there is no undirected edge y — w that occurs 
in both C and Ci . 

For any w G Ny in Ci, the edge between w and y is directed in C; that is, either w — > y or 
y ^ w occurs in C. If y — > w is in C, there are three possible configurations between x and 
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w in C: (1) x is not adjacent to w^'^) w x, and (3) x w. If x and w are not adjacent 
in C, inserting x — ?> y will result in y — > u; in Ci. If w — > a; occurs in C, inserting x — > y is 
not valid for C since there would be a directed path from y to x. If x ^ w occurs in C, w 
is common child of x and y, so from condition ida, y ^ w occurs in Ci and w (jz. Ny in Ci. 
Thus, we have that w ^ y must be in C. 

If there is another vertex v € Ny in Ci, u — j/ must also be in C. If d and w are not 
adjacent, v ^ y w forms a v-structure both in C and in Ci. w ^ y must occur in Ci and, 
consequently, w ^ Ny in Ci yielding a a contradiction. Thus, we know that any two vertices 
in Ny are adjacent in C. Ny is therefore a clique in Ci and the operator DeleteD x y is 
valid for Ci; that is, the condition idi in Definition 9 holds. 

Denote the modified PDAG of operator DeleteD x y of Ci as V' . We need to show 
that the corresponding completed PDAG of V' is C Equivalently, we just need to show V' 
and C have the same skeleton and v-structures. Clearly, V' and C have the same skeleton. If 
there is a v-structure in C, but not in Ci, it must be a; — > u y, where it is a common child 
of X and y. From condition ids in Definition 9, a; — ?> u and y ^ u also occur in Ci, so, these 
v-structures also exist in V' . This implies that all v-structures of C are also in V' . Moreover, 
the v-structures in Ci but not in C must be a: — )■ y i;, where v is parent of y, and x and 
V are not adjacent in Ci. Clearly, after we delete a; — )■ y from Ci, these v-structures will not 
exist in V' . This implies that all v-structures of V are in C. So, V and C have the same 
v-structures. 

For any u — )■ y in Ci, if u — y is in C, w must be parent of x. If x and v are not adjacent, 
inserting a; — > y to C will result in y — > v in Ci. Moreover, x — v — y does not exist in C since 
InsertD a: — > y is a valid operator, and a; — > w — y does not occur in C. Thus, for any v that 
is a parent of y but not a parent of x, the directed edge w — >■ y also occurs in the resulting 
completed PDAG C. That is, the condition id2 in Definition 9 holds. 

□ 

Proof of Lemma 9 

To prove this lemma, we first introduce Lemma 16 and Lemma 17. Let L — {ui,U2, 
• • • , Uk) be a partially directed path from ui to Uk in a graph. A path L2 — (m^, • • • , u^) is 
a sub-path of Li if all vertices in Li are in L and have the same order as in L. We say that 
a partially directed path is shortest if it has no smaller sub-path. 

Lemma 16. Let C be a completed PDAG, and let Li be a partially directed path from y to x 
inC. Then there exists a shortest sub-path of Li, denoted as L2 — y— • • — Ufe —>•••—> a;, 
in which there exists a k such that all edges occurring before Uk in the path are undirected, 
and all edges occurring after Uk are directed. 

Proof. We just need to show that a directed edge must be followed by a directed edge in the 
shortest sub-path. If not, Ui — > Ui^i — occurs in L2. Because C is a completed PDAG, 
Ui and Ui+2 must be adjacent; otherwise Ui+i — ?> Ui+2 occurs in C. If Ui — > Ui+2 occurs in C, 
L2 is not a shortest path. If Ui ^ u,;+2 occurs in C, u^+i -i— Ui^2 must be in C 

□ 

Lemma 17. // the graph Vi obtained by deleting a b from a completed PDAG C can be 
extended to a new completed PDAG, Ci, then we have that for any directed edge x ^ y in 
C, if y is not b or a descendent ofb, then x ^ y occurs in Ci. 
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Proof. Because x ^ y occurs in C, so it is strongly protected in C. If a; — > y does not occur 
in Ci, it is not strongly protected in Ci from Lemma 2. From the definition of strongly 
protected, we know that the four cases in Figure 2 in which w — >■ m is strongly protected 
do not involve any descendant of u. Thus, if x — >■ y is not compelled in Ci, there must 
exist a directed edge w ^ z between two non-descendants of y such that the edges between 
non-descendants of z are strongly protected and w — 2; is no longer strongly protected in Vi . 
Because Vi is obtained by deleting a ^ 6, z is non-descendant of b, we have that w — > is 
strongly protected in Vi, yielding a contraction. □ 

We now give a proof of Lemma 9 

Proof. Since C € Sp, we have nc^ < n. That is, the condition idi in Definition 9 holds for 
InsertD x — ^ y of Ci . 

For any undirected edge w — y in C, x must be parent of w, otherwise the edge between 
y and w is directed. Then deleting x — > y from C will result in w ^ y in Ci. Thus, we have 
that all Ny in C become parents of y in Ci . From the condition dd2 , the parents of y but 
not j; in C are also parents of y in Ci . If there is a partially directed path from y to a; in Ci , 
then the vertex adjacent to y in this path must be a child of y or a vertex that is parent of 
y and a; in C We will show that if the vertex is not a parent of y and a; in C, there exists a 
contradiction. 

If there is a partially directed path from y to a; in Ci, we can find a shortest partially 
directed path like y — mi — — Ufc— )-a; from Lemma 16, denoted as Li. Any directed 
edge, say Ui — >■ u^+i, in Li does not become Ui Ui+i in C. If Li does not include undirected 
edges in Ci, we have that the vertices of Li form a partially directed cycle in C. We just 
need to show that the vertices of the undirected path Li also form a partially directed path 
in C. 

Suppose y -> ui occurs in C. If ui — U2 is undirected in C, then y — > M2 must occur 
in C, consequently, Li will not be shortest in Ci. If U2 — >■ wi occurs in C, there exists a 
v-structure U2 ui ^ y in Ci otherwise U2 and y are adjacent, and Li is not the shortest 
path in Ci. Thus, ui — > U2 must occur in C. In this manner, we get that all edges in 
y — ui — — Ufe— )■•••— >-a; are directed in C and are directed from Ui — Wi+i. This implies 
that there exists a partially directed cycle in C. So, wi must be a parent of y and x in C. 
We have ui £ Qxy and every partially directed path of Ci from y to a; contains at least one 
vertex in Q^y 

Since all vertices in fl^y in Ci are parents of x and y in C, if there are two vertices, say 
wi,W2 G ^xy, that are not adjacent, the subgraph wi — > y -s— W2 could be a v-structure in 
Ci. So, all vertices in Q^y in Ci are adjacent and flxy is a clique. 

We have that the parents of y in Ci {(ny)ci) is in the union of the parents and neighbors 
of y in C ((Hj, IJ Ny)c-^). If there is at least one neighbor u of y inC, u must be child of x in 
C and parent of y in Ci, so parents of x and y are not the same. If there is no neighbor of y 
in C, the parents of y in Ci is the same as in C except those vertices that are parents of x 
that is (IIj, — Ilx)ci = (n^ — ^x)c- At the same time, from Lemma 17, the parents of x in 
Ci are also the parents of a; in C Thus, the parents of x and y are not the same in Ci. From 
Lemma 3, we have that InsertD a: — > y is valid for Ci and the condition id2 holds. 

Denote the modified PDAG of operator InsertD a; — >■ y of Ci asV . We need to show that 
the corresponding completed PDAG of V' is C. Equivalently, we just need to show that T" 
and C have the same skeleton and v-structures. Clearly, V' and C have the same skeleton. 
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A v-structures that is in C but not in Ci must have the form x ^ y ^ u, where u is parent 
of y but not adjacent to x. From condition dd2 in Definition 9, u — ?> y also occurs in Ci, so, 
such a v-structure must also exist in V' . This imphes that all v-structures of C are also in 
V' ■ Moreover, the v-structures in Ci but not in C must have the form x ^ v ^ y, where v 
is a common child of y and x in Ci . Clearly, after we insert x ^ y to Ci, this is no longer 
a v-structure in V' implying that all v-structures of V' are in C. Thus, V' and C have the 
same v-structures. 

Let the modified graph of DeleteD x y from C be T'; we know that V and Ci have the 
same v-structures. Thus, for any u that is a common child of x and j/ in Ci , a; — >■ u y is a 
v-structure in V. This implies that y ^ u occurs in C and the condition Ida hold. □ 

Proof of Lemma 10 

Proof. Since x, z and y are in the same chain component of C, they have the same parent 
set in C. The modified graph of a' has the same skeleton and v-structures as Ci because all 
compelled edges in C remain compelled in Ci . We just need to prove that the operator a' is 
valid, equivalently, to prove that the conditions rmi, rm2 and rm^ hold for Ci. 

We now show that the condition rnii, x and y have the same parents in Ci holds. Because 
X and y have the same parents in C and all directed edges in C occur in Ci, we just need to 
consider the neighbors of x or y. Let w — y he any undirected edge in C, we consider the 
edges between w and x or z. 

1. If both w — z and x — w occur in C, w ~ y and w — x must be undirected in Ci. 

2. If w — z occurs but x — w does not occur in C, z — >■ w and y ^ w must be in Ci. 

3. If X — w occurs but w — z does not occur in C, there is an undirected cycle of length 4 
without a chord in C. Thus, this case will not occur. 

4. If neither w — z nor x ~ w occur in C, and there is no undirected path other than 
w — y — z from w to z in C, then w — y occurs in Ci. If there exists another undirected 
path from w to z, there must exist an undirected path of length 2 like w ~ u' ~- z in C, 
and y is adjacent to u' . In this case, y — w occurs in Ci when x — u' occurs and y ^ w 
occurs when x and u' are not adjacent. 

Thus, there are no neighbors of y in C that become parents of y in Ci; i.e., y has the same 
parents in both Ci and C. Similarly, x has the same parents in both Ci and C. we get x and 
y have the same parents in Ci, and the condition rnii holds. 

All parents of x must also be parents of z in Ci since they are in the same chain component. 
For any w G N^y, w — z also occurs in C; otherwise x — z — y — w — x would form cycle of 
length 4 without a chord. We have w z must be in Ci, otherwise a new v-structure will 
occur in Ci. Thus, we have Ii{x) U N^y C 11(2:) in Ci. 

For any w £ II(z) in Ci, if w € ^{z) in C, it must also be parent of x,y and z in Ci, so 
w G II(a;) in Ci. If w — z is an undirected edge in C, there exist undirected edges w — x and 
w — y in C such that w — ?► z is in Ci . Thus, w G N^y in Ci. We have that w e n(a::) U N^y 
and n(z) C n(a;) U N^y in Ci. Thus, n(z) = n(x) U N^y in Ci and the condition rm2 holds. 

Any undirected path between x and y in Ci will also be an undirected path in C, so, these 
paths contain at least one vertex in N^y in C. From the proof above, any vertex in N^y in C 
is also a vertex of N^y in Ci. Thus, any undirected path between x and y contains a vertex 
in Nxy in Ci and the condition rm^ holds. □ 
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Proof of Lemma 11 

Proof. From Lemma 5 and the condition rma, there exists a consistent extension of C, 
denoted by T>, such that all neighbors of a; in C are children of x in I?, and all neighbors of y 
in C are parents of x in T). Changing ?/ — z to z — > y in 2?, we obtain a new graph I?'. From 
the proof of Lemma 4, we can get that (1) V is a DAG, (2) V' is a consistent extension of 
Ci. Thus, 2? is a consistent extension of the PDAG that results from making the v-structure 
a; — z ^ ?/ in Ci. Thus, we can get C by applying MakeV x — >■ z ^ y to Ci. This implies 
that MakeV a; — > z -s— y is a valid operator of Oi and satisfies the condition mvi . □ 

Proof of Theorem 6. 

In order to prove this theorem, we first prove some results shown in Lemma 18, Lemma 
19 and Lemma 20. 

Lemma 18. For any completed PDAG C containing at least one undirected edge, there 
exists an undirected edge x — y for which N^y is a clique. 

Lemma 19. For any completed PDAG C, if x ^ y occurs in C, then Hj, ^ Hy \ x. 

A proof of Lemma 18 and Lemma 19 can be found in Chickering [6]. 

Lemma 20. For any completed PDAG C containing no undirected edges and at least one 
directed edge, there exists at least one vertex x for which any parent of x has no parent. 

Proof. The following procedure will find the vertex whose parent has no parent. Let a ^ b 
be a directed edge in C, set y = a and x = b. 

1. If IIj, is not empty, choose any vertex u in IIj,, set x = y and y ~ u. Repeat this step 
until we find a directed edge y — >■ a; for which Hy is empty. 

2. Since liy is empty, from Lemma 19, there exists at least one vertex other than x in 
IIj, . If there is a vertex u G Tix and u ^ y such that n„ is not empty, choose a vertex 
in n„, denoted as v and set y = v and x = u, and go to step 1. 

Since C is an acyclic graph with finite vertices, above procedure must end at the step in 
which the parents of x have no parents. □ 

We now show a proof of Theorem 6. 

Proof. We need to show that for any two completed PDAGs Ci,C2 S S, there exists a 
sequence of operators in O such that C2 can be obtained by applying a sequence of operators 
to PDAGs, starting from Ci. Because O is reversible, any operator in O has a reversible 
operator, so we just need to show that any completed PDAG can be transferred to empty 
graph without edges. The procedure includes three basic steps. 

1. Deleting all undirected edges. 

From Lemma 18, for any completed PDAG containing at least one undirected edge, we 
can find an operator with type of DeleteU that satisfies the condition dui in Definition 
9. We can delete an undirected edge with this operator and get a new completed PDAG 
whose skeleton is a subgraph of the skeleton of the initial completed PDAG. Repeating this 
procedure, we can get a completed PDAG, denoted as C,;, which contains no undirected 
edges. 

2. Deleting some directed edges. 
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From Lemma 20, we can find a vertex, denoted as x, whose parents have no parents in the 
completed PDAG Ci. If II^: contains more than two vertices, we can choose a vertex u €lix- 
Because (1) Nx is empty in Ci and (2) any other directed edge v ^ x forms a v-structure 
in Ci, we have that z; — >■ cc is also compelled in the completed PDAG obtained by deleting 
directed edge u — >■ a; from d . We can delete v ^ x from d and get a new completed PDAG 
whose skeleton is a subgraph of the skeleton of the initial one. Thus, the new completed 
PDAG is in S. Repeat this procedure for all other directed edges u' — a; in which v' G H-x 
until there are only two vertices in IIj, in the new completed PDAG, denoted as Cj. 
3. Remove a v-structure. 

The conditions rm i, rm2 and rma hold for the v-structure y — > x ^ w in , so, we can 
remove y x ^ u from Cj and get a new completed PDAG whose skeleton is a subgraph 
of the skeleton of the initial graph. Denote the resulting completed PDAG as Ck , it may still 
contain some undirected edges. 

By repeatedly applying the above the steps in sequence, we can finally obtain a graph 
without any edges. 

□ 

Appendix B.2: Proofs of Theorem 2 

There are three statements in Theorem 2; we prove them one by one below. 
Proof of (i) of Theorem 2 
(If) 

Figure 1 shows the four cases that ensure that an edge is strongly protected. We first 
show that for any edge x ^ u (or y ^ u) , where u is a common child of x and ?/, if x — > w 
is strongly protected in Vt+i by configuration (a), (6), or (c?) in Figure 1 (replace u — u by 
a: — )■ m), it is also directed in et+i- 

Case (1), (2) and (3) in Figure 12 show the sub-structures of Vt+i in which x -> u is 
protected by case (a) , (6) and (d) in Figure 1 respectively, where Vt+i is the modified graph 
obtained by inserting x — y into et- 




Fig 12. strongly protected in Vt+i 
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If a; — > u is protected in Vt+i like case (1) in Figure 12, w ^ x ^ u occurs and w and 
u are not adjacent in Vt+i- If w — > x is undirected in et+i, from Lemma 14, there exists a 
path from y to x. Any parent of x that is in this path must not be a parent of y; otherwise, 
there exists a directed cycle from y to ?/ in et. Hence we have that the parent sets of y and x 
are not equal. This is a contradiction of the condition lix — in Lemma 3. We have that 
■u; — > X and x — >■ u occur in Ct+i. 

If x ^ u is protected in Vt+i by v-structure x — > u lu, like case (2) in Figure 12, 
clearly, the v-structure also occurs in et+i, so x — > u occurs in et+i- 

If X — >■ u is protected in Vi like case (3) in Figure 12, we have that the v-structure 
■u; — 7- u ^ wi also occurs in et+i- If either x — w or u — x is in et+i, we have that wi — x 
and w — >■ X are both in et+i and the v-structure Wi -> x w occurs. Hence we have have 
that X — >■ u occur in Cj+i. 

Now we show that if x — > u is protected in Vt+i like (c) in Figure 1, it is also protected 
in et+i- For any ui in x — > mi ^ u, there are only two cases: ui and y are adjacent or 
nonadjacent. 

When Ml and y are not adjacent, like (4) in Figure 12, there is a v-structure ui ^ u ^ y 
in Vt+i - Then ui u occurs in et+i- If x — u occurs in e^+i, by Lemma 12, the edge between 
X and Ml must be directed and oriented as mi — )■ x in et+i - This is impossible, because there 
exists some extension of Vt+i that has an edge oriented as x — > ui. Thus, x — m occurs in 
et+i. 

When ui and y are adjacent, we have that ui — >■ y and ui — y do not occur in Pt+i 
since = Py must hold in et for the validity of the operator InsertU x — y. Hence we have 
that y — 7> Ml occurs in Pt+i and x — ?> u is strongly protected like case (5) in Figure 12. We 
consider two cases: x — >■ mi occurs or does not occur in et+i- 

Assume x — > mi occurs in Ct+i- If ui — >■ m occurs in ct+i, clearly, x — > m must occur in 
et+i because there is a partially directed path x — )• mi — )• m in ct+i- If mi — > m is undirected 
in et+i, from Lemma 12, x — >■ m must occur in et+i- 

In case (5), we have that Ui is also a common child of x and y, so, x — mi will also be 
strongly protected in Pt+i from the condition iu^. Now, consider x — > mi; if it is protected 
in Pt+i like any of case (1), (2), (3), or (4), then, by our proof, x — > ui occurs in et+i- Thus, 
X ^ M must occur in et+i - If x — > mi is protected in Pt+i like case (5), we can find another 
vertex U2 that is a common child of x and y like case (6). From the proof above, we know 
if X — >■ M2 occurs in 64+1, x — > Mi and x — J' m also occur in et+i- Since the graph has finite 
vertices, we can find a common child of x and y, say Uk, such that x — >■ Mfe is protected 
in Pt+i like one of cases (1), (2), (3) or (4). Thus, x — >■ m^ occurs in et+i, implying that 
X — )■ Uk-i occurs in Pt+i, so, finally, x — u occurs in Pt+i- 

(Only if) From Lemma 15, we have that the modified graph Pt+i is also the resulting 
completed PDAG et+i- Hence, all directed edges in et+i are strongly protected in Pt+i, so 
the Algorithm 1.1.1 will return True. 

□ 

Proof of (ii) of Theorem 2 

To prove (ii) of Theorem 2, we need following lemma. 

Lemma 21. Let et be a completed PDAG, Pt+i be the PDAG obtained in Algorithm 1.1.2 
with input of a valid operator InsertD x — > and et+i be the resulting completed PDAG 
extended from Vt+i . We have: 
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1. If u is not a common child of x and y, then all directed edges y ^ u in Vt+i are also 
in et+i- 

2. All directed edges v ^ y in Vt+i are also in Ct+i; 
Proof (1) 

If u is not a common child of x and y, and y ^ u occurs in Vt+i, we have that there is 
a structure Uke a; — > y — > m in Vt+i- Because x ^ y occurs in Cf+i, y ^ u must be in e^+i 
too. 

(2) 

From Algorithm 1.1.2, all directed edges v ^ y are strongly protected in Vt+i- When v 
is not adjacent to x in et+i, w — >■ y 4— x is a v-structure, so u — >■ ?/ occurs in Ct+i- When v 
is adjacent to x, we show below that if u — >■ y is strongly protected like one of four cases in 
Figure 13, it is also strongly protected in et+i. 



(1) V 




Fig 13. strongly protected in Vt+i- 

In case (1) of Figure 13, because there is no path from y to we have that w — > w occurs 
in Ct+i from Lemma 14. Hence we have that w — > ?/ occurs in Ct+i- 

In case (2), there is a v-structure w — ?/ -s— u in Vt+i- So, v ^ y occurs in Ct+i- 

In case (3), because there is no path from y to u, we have that w — )■ u occurs in ej+i 
according to Lemma 14. If m — >■ y occurs in et+i, v ^ y occurs in et+i- li u ^ y become 
u — y in et+i, v ^ y must also be in in et+i from Lemma 12. 

From the proof of (i) of Theorem 2, we also have that f — >■ y must be in ct+i when case 
(4) occurs in Vt+i- 

Notice that the above proof also holds when we replace x — v hy a, directed edge or add 
an edge between x and w{ or u). Hence we have that w — > ?/ in Vt+i also occurs in Ct+i- □ 

We now give a proof for (ii) of Theorem 2. 
(If) 

We need to consider four cases in Figure 1 in which y — > u is strongly protected in Vt+i- 
Similar to the proof of (i) of Theorem 2, we first prove that the theorem holds in the first 
three cases in Figure 1, which correspond to the cases (1)', (2)' and (3)' shown in Figure 14. 
Notice that the following proof holds for any configuration of the edge between x and w. 

Consider the case (1)' in Figure 14. From Lemma 21, w ^ y occurs in et+i- We have that 
X ^ u must occur in Ct+i- 

Because there is a v-structure w — > m <— y in case (2)', we have that w ^ u y also 
occurs in et+i- 

After implementing Algorithm 1.1.2, if case (3)' occurs in Vt+i, we have that y — w is 
not strongly protected in Vt+i and the edge between y and w have opposite directions in 
different consistent extensions of Vt+i- Hence y — w occurs in ct+i- Similarly, y — wi also 
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Fig 14. Five cases in which x u or y u is strongly protected. 



occurs in et+i- Moreover, the v-structure w — > u -s— wi occurs in et+i- We have that y ^ u 
is strongly protected and occurs in e^+i. 

We now just need to show that a directed edge y ^ u that is strongly protected in Pt+i 
like case (4)' {x and ui are nonadjacent) or (4)" {x and ui are adjacent) in Figure 14 is also 
directed in ct+i- 

In case (4)', from delete Lemma 21, y ^ ui occurs in et+i. Moreover, a: — > u -f- ui is a 
v-structure, so Ui — )■ u also occurs in Ci. So we have y ^ u must occur in et+i- 

In case (4)", we have that ui is also a common child of x and y; hence, y ui will also 
be strongly protected in Vt+i from the condition of this Theorem. Consider y — )■ ui; if it 
is protected in Vt+i like at least one case other than (4)", from our proof, y — ^ ui is also 
compelled in et+i, so y — u must be compelled in ct+i- If y — > mi is protected in Vt+i like 
case (4)", we can find another vertex U2 that is a common child of y and x; from the proof 
above, we know if y — >■ W2 is directed in et+i, y — > ui and y ^ u are directed too. Since the 
graph has finite vertices, we can find a common child of x and y, say u^, such that Uk is 
protected in Vt+i like at least one case other than (4)". It is compelled in et+i, so we can 
get y — >■ Uk-i is compelled in Vt+i, so, finally, y — ?> m is also compelled in Vt+i- We have 
that y ^ u must occur in Ct+i and id^ holds. 

(Only if)Let u be a common child of x and y in Cj. If condition id^ holds for a valid 
operator InsertD x — )■ y, we have that y — > u in occurs in Ct+i and is strongly protected 
in et+i- We need to show that y u must be strongly protected in 'Pt+l^ obtained in 
Algorithm 1.1.2. From the proof of this statement above, we know we just need to consider 
the five configurations in which y — >■ u is strongly protected in et+i in Figure 14. 

We know that v-structures in et+i occur in Vt+i therefore, the v-structure in the cases 
(2)', (3)' and (4)' in et+i must occur in Vt+i too. 

For case (2)', y — >• u is also strongly protected in T't+i, since the v-structure y ~> u i- w 
occurs in Vt+i- 

For case (3)', we have that (1) the v-structure wi — ^ w w occurs in Vt+i', (2) et+i and 
Vt+i have the same set of v-structures. Hence the v-structure wi — y ^ w does not occur 
in Vt+i- We have that y — !■ u is also strongly protected in Vt+i for any configuration of 
edges between wi, y and w. 
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For case (4)', from Algorithm 1.1.2, y ^ ui occurs in T'f+i. Hence y — > m is strongly 
protected in Vt+i- 

Because the valid operator "Insert x y" satisfies condition ids, from Lemma 8, we have 
that the operator "Delete x — >■ j/", when applied to ej+i, results in e*. From the condition 
dd2, any directed edge u — > y in et+i also occurs in et- For case (1)', we have that v ^ y ^ u 
is strongly protected in Vt+i- 

Consider the case (4)", we have that v-structures x ^ u ^ y and x ui y occur 
in et since et is the resulting completed PDAG of the operator "Delete x — y" from et+i. 
According to Algorithm 1.1.2, x^y, x^u^y and cc — >■ tti — >■ y occur in Vt+i- We 
have that u — ui does not occur in et, otherwise u ui occurs in at least one consistent 
extension of Vt+i and consequently ui — ?> m does not occur in et+i- To prove that y ^ u 
is strongly protected in et+i, we need to show that Mi — >■ m occurs in et- Equivalently, we 
show ui — u does not occur in et. If ui — u occurs in a chain component denoted by t in 
et, we have that neither x nor y are in r. The undirected edges adjacent to x or y are in 
chain components different to r. Hence ids holds for the operator "Insert x ^ y" , and all 
parents of t occur in et+i too. We have that ui ~ u occurs in et+i too. It's a contradiction 
that Ml — )■ y occurs in et+i- □ 
Proof of (iii) of Theorem 2 

(If) 

Since Algorithm 1.1.3 returns True, all directed edges like u — >■ y are strongly protected 
in Vt+i- Consider the four configurations in which w — y is strongly protected in 7^t+i in 
Figure 15. Notice that Vt+i is obtained by deleting x — > y from completed PDAG C, by 
Lemma 17, all directed edges with no vertices being descendants of y (excluding y) in Vt+i 
will occur in et- 

Hence, we have the edges w ^ v in case (1), and v ^ w in case (3) will remain in et+i. 
We have v ^ y in case (1) and case (3) must occur in et+i. Because v-structures in case (2) 
and case (4) will also remain in Cf+i, z; ^ y in case (2) and case (4) must occur in Cf+i too. 



(!):■« >- y (2):v {3) : v ^y (4) : v ^y^(wy^wi) 

/ \ / \ 



Fig 15. Four configurations of v y being strongly protected. 



(Only if) If condition dd2 holds for a valid operator DeleteD x —?' y, all edges like w — >■ y 
{v 7^ x) in et will occur in et+i. w — y must be strongly protected in et+i. Consider the 
four configurations in which u — >■ y is strongly protected in et+i as Figure 15. We know that 
v-structures in et+i must occur in et; consequently, all directed edges in et+i must occur in 
et; they also occur in Vt+i - From Lemma 17, w — v — wi in case (4) in Figure 15 must be in 
■pt+i, so an edge v ^ y that is strongly protected in et+i is also strongly protected in "Pt-i-i- 

□ 



